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Introduction 


Studies  have  shown  that  the  age  of  onset  of  breast  cancers  in  BRCA2  mutation  carriers  is  significantly  later  than 
for  BRCA1  mutation  carriers  (Fodor  et  al.,  1998).  However,  while  the  age  specific  penetrance  may  differ  the 
cumulative  lifetime  risk  appears  to  be  similar.  In  addition,  there  is  substantial  variation  in  the  age  of  onset  and 
the  site  of  cancer  amongst  BRCA1  and  BRCA2  mutation  carriers,  even  in  the  same  family  (Goldgar  et  al., 
1994).  This  strongly  suggests  that  genetic  and/or  environmental  modifiers  of  breast  cancer  risk  in  BRCA1/2 
mutation  carriers  exist  (Rebbeck  2002).  Certainly,  some  component  of  this  effect  is  due  to  differential  risks 
associated  with  different  mutations  in  the  genes  (Gayther  et  al.  1995,  1997).  However,  there  are  likely  to  be 
multiple  low-penetrance  genes  that  also  increase  the  susceptibility  to  breast  cancer.  Mutated  forms  of  these 
genes  probably  confer  only  a  small  to  moderate  increase  in  the  lifetime  breast  cancer  risk,  but  because  variations 
in  these  low  penetrance  genes  are  present  in  a  large  number  of  people,  the  population  risk  for  breast  cancer 
caused  by  these  genes  could  be  substantial  (Rebbeck  1999).  These  observations  raise  the  question  of  whether 
genes  associated  with  other  functions  of  BRCA1  and  BRCA2  might  also  be  modifiers  of  breast  cancer  risk  in 
carriers. 

Recent  findings  have  shown  that  both  BRCA1  and  BRCA2  are  involved  in  regulation  of  the  G2  to  M  phase 
transition  in  the  cell  cycle.  In  addition,  it  has  been  shown  that  both  proteins  are  localized  to  the  centrosome  and 
regulate  centrosome  duplication  and  centrosome  function  (Hsu  et  al.,  2001;  Nakanishi  et  al.,  2007).  Mutations 
in  BRCA1  and  BRCA2  are  correlated  with  aberrant  duplication  of  the  centrosome  leading  to  centrosome 
amplification,  chromosome  mis-segregation,  and  aneuploidy  (Xu  et  al.,  1999;  Deng  2002;  Starita  et  al.,  2004; 
Wu  et  al.,  2005).  Based  on  these  data,  we  questioned  whether  other  proteins  that  mediate  centrosome  function 
might  act  as  modifiers  of  breast  cancer  risk  in  BRCA1/2  carriers.  The  AURORA-A/AURKA/BTAK/STK15 
gene  encodes  a  centrosome-associated  kinase  that  causes  centrosome  amplification,  failure  of  cytokinesis,  and 
aneuploidy  when  amplified  and/or  overexpressed  in  breast  tumors.  STK15  is  also  known  to  bind  to  BRCA1  and 
BRCA2  (Ouchi  et  al.,  2004;  unpublished  data).  The  F31I  polymorphism  in  STK15  was  originally  identified  as  a 
candidate  lung  tumor  risk  modifier  locus  in  a  mouse  model  (Ewart-Toland  et  al.,  2003).  F31I  altered  the  activity 
of  the  Aurora  box-1  of  the  STK15  protein,  resulting  in  disruption  of  p53  binding  and  a  decreased  rate  of 
degradation  of  STK15.  The  stabilized  STK15  was  associated  with  centrosome  amplification  and  failure  of 
cytokinesis,  increased  chromosomal  instability  and  aneuploidy,  suggesting  a  direct  effect  on  the  F31I  variant  on 
promotion  of  tumor  formation  (Ewart-Toland  et  al.,  2003).  In  a  study  of  incident  breast  cancer  cases  (n  =  941) 
and  age-matched  population  controls  (n=830),  Egan  et  al.  (Egan  et  al.,  2004)  found  that  the  breast  cancer  risk 
for  Ile/Ile  homozygotes  were  at  increased  risk  for  breast  cancer  (OR  =  1.54;  95%  Cl:  0.96-2.47),  although  this 
finding  was  not  significant.  Sun  et  al.  observed  that  the  He  encoding  allele  is  the  common  allele  in  the  Chinese 
population  whereas  the  Phe  encoding  allele  is  more  common  in  Caucasian  populations  (Sun  et  al.,  2004).  In 
addition,  an  association  between  Ile/Ile  homozygotes  and  ER  negative  breast  carcinomas  (OR  =  2.56;  95%  Cl: 
1.24-5.26)  was  detected.  Lo  et  al.  reported  a  significant  association  between  AURKA  haplotypes  and  breast 
cancer  risk  (Lo  et  al.,  2005).  Ewart-Toland  et  al.  also  found  an  increase  in  cancer  risk  for  the  Ile/Ile 
homozygotes  (OR  =  1.35,  95%  Cl:  1.12-1.64;  p  =  0.002)  in  a  meta-analysis  of  data  from  four  case-control 
breast  cancer  populations  (Ewart-Toland  et  al.,  2005).  Based  on  these  data,  we  hypothesized  that  the  F31I 
polymorphism  is  associated  with  increased  risk  of  breast  cancer  in  BRCA1  and  BRCA2  mutation  carriers. 

Since  then  additional  studies  of  STK15  F31I  have  been  completed.  Post-menopausal  women  homozygous  for 
the  F31I  and  I57V  alleles  of  AURKA  in  a  case-control  study  nested  within  the  Nurses'  Health  Study  prospective 
cohort  had  an  increased  risk  of  invasive  breast  cancer  (OR  1.63,  95%  Cl  1.08-2.45)  (Cox  et  al.,  2006).  In 
contrast,  Dai  et  al.  did  not  observe  a  significant  association  with  breast  cancer  risk  for  Ile/Ile  homozygotes  (OR 
=  1.2;  95%  Cl,  0.9- 1.6)  in  a  population  based  case-control  series  of  Han  Chinese  (Dai  et  al.,  2004),  and  Fletcher 
et  al.  (Fletcher  et  al.,  2006)  found  no  association  between  Ile/Ile  homozygotes  and  risk  of  bilateral  breast  cancer 
(OR  =  0.63,  95%  Cl  0.34-1.13). 
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Body 

Aim  1.  To  validate  the  association  between  Val57Ile  in  STK15  and  increased  risk  of  breast  cancer  in  a  large 
cohort  of  BRCA1/2  mutation  carriers. 

Task  1.  We  began  the  study  by  genotyping  the  F31I  and  V57I  polymorphisms  in  DNAs  from  1332  carriers  of 
BRCA1  and  BRCA2  deleterious  mutations  that  were  provided  by  four  collaborating  groups. 

Task  2.  No  association  with  risk  of  breast  cancer  in  BRCA1/2  mutation  carriers  was  observed  for  F31I 
heterozygotes  (OR  =  0.95;  95%CI  0.82-1.11)  or  for  V57I  heterozygotes  (OR  =  1.01;  95%CI  0.86-1.18)  or 
homozygotes  (OR  =  0.71;  95%CI  0.41-1.24).  However,  homozygosity  for  the  F31I  allele  was  associated  with 
an  increased  risk  of  breast  cancer  (OR  =  1.23;  95%CI  0.93-1.63).  This  association,  while  insignificant,  was 
consistent  even  when  evaluating  BRCA1  carriers  (OR  =  1.24;  95%CI  0.90-1.71)  or  BRCA2  carriers  (OR  =  1.20; 
95%CI  0.68-2.14)  alone. 

Task  1.  In  an  effort  to  verify  the  association  of  F31I  with  breast  cancer  risk  in  BRCA1  and  BRCA2  carriers  we 
established  a  large  consortium  of  investigators  from  the  U.S.A.  and  Europe.  A  total  of  4935  female  BRCA1, 
2241  female  BRCA2  deleterious  mutation  carriers  and  11  individuals  carrying  both  BRCA1  and  BRCA2 
mutations  from  16  participating  groups  were  included  in  this  study.  Of  these  7187  mutation  carriers,  3884  had  a 
diagnosis  of  breast  cancer  at  the  end  of  follow  up  and  3303  were  censored  as  unaffected  at  a  mean  age  of  43.4 
years. 

Task  2.  To  avoid  overlap  between  studies  we  compared  carriers  by  country  of  origin,  year  of  birth,  mutation 
and  reported  ages.  The  frequency  of  the  recessive  Ile/Ile  encoding  genotype  in  the  16  groups  varied  between  3% 
and  8%,  which  is  similar  to  estimates  from  other  populations.  There  was  no  difference  in  the  frequency  of  the 
Ile/Ile  recessive  genotype  across  genotyping  platforms  (p=0.33).  Similarly,  the  study  sites  with  the  highest 
Ile/Ile  frequencies  did  not  have  ethnic  mixtures  significantly  different  to  the  other  study  sites.  The  F31I 
polymorphism  did  not  deviate  significantly  from  Hardy-Weinberg  equilibrium  (p=0.07)  among  all  7187 
affected  and  unaffected  carriers. 

The  estimated  risk  of  breast  cancer  associated  with  the  recessive  genotype  for  F31I  in  BRCA1  and  BRCA2 
carriers  was  calculated  using  a  weighted  Cox  proportional  hazards  model.  While  there  was  a  suggestion  of  a 
protective  effect  (HR  =  0.91;  95%CI  0.77-1.06)  overall,  the  result  was  not  statistically  significant.  Similarly,  no 
association  with  risk  was  observed  for  individual  participating  centers,  other  than  for  two  centers  that 
contributed  small  numbers  of  carriers  to  the  study.  A  test  for  heterogeneity  across  study  site  was  not  significant 
(p=0.06).  We  also  evaluated  whether  the  Ile/Ile  genotype  was  associated  with  risk  of  breast  cancer  in  BRCA1 
carreirs  alone  or  BRCA2  carriers  alone.  No  significant  association  with  risk  was  detected  for  either  BRCA1  (HR 
=  0.90;  95%CI  0.75-1.08)  or  BRCA2  carriers  (HR  =  0.93;  95%CI  0.67-1.29)  (Couch  et  al„  2007).  As  other 
studies  have  reported  an  association  between  the  recessive  Ile/Ile  encoding  genotype  and  postmenopausal  status 
in  non-carriers  (Egan  et  al.,  2003;  Cox  et  al.,  2007),  we  considered  the  influence  of  menopausal  status  of 
carriers  on  breast  cancer  risk.  At  the  end  of  follow-up,  4201  carriers  were  pre-menopausal  and  2986  were  post¬ 
menopausal.  No  significant  association  with  risk  was  detected  (Couch  et  al.,  2007).  Because  prophylactic 
oophorectomy  substantially  reduces  the  risk  of  breast  cancer  in  BRCA1  and  BRCA2  mutation  carriers  (REbbeck 
et  al.,  2002),  we  also  evaluated  the  influence  of  prophylactic  oophorectomy  status.  A  total  of  707  individuals 
reported  undergoing  prophylactic  oophorectomy,  4298  reported  no  history  of  oophorectomy,  while  2182  (30%) 
provided  no  data  at  last  follow  up.  Associations  with  breast  cancer  risk  by  category  of  prophylactic 
oophorectomy  did  not  differ  markedly  from  the  overall  results.  Secondary  analyses  using  a  two  degree-of- 
freedom  general  model  also  failed  to  detect  a  significant  association  for  either  a  single  copy  (p=0.97)  or  two 
copies  (p=0.24)  of  the  F31I  polymorphism  compared  to  no  copies  (Couch  et  al.,  2007). 

In  an  effort  to  account  for  possible  survival  bias  and  the  inclusion  of  prevalent  cases  in  the  collection  of  BRCA1 
and  BRCA2  carriers,  we  repeated  our  analysis  after  excluding  cases  diagnosed  more  than  three  years  prior  to  the 
date  of  ascertainment.  For  this  analysis  we  excluded  records  where  an  age  at  interview  was  not  provided. 
Overall,  the  mean  difference  between  age  of  diagnosis  and  age  at  interview  for  the  3422  cases  with  available 
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data  was  8.7  years.  Of  these  1,322  (38.6%)  cases  had  been  diagnosed  less  than  three  years  prior  to  the  date  of 
ascertainment.  When  excluding  prevalent  cases  no  association  between  the  Ile/Ile  genotype  and  breast  cancer 
risk  was  observed,  and  the  risk  estimates  were  similar  to  those  obtained  when  using  both  prevalent  and  incident 
cases  (Couch  et  al.,  2007).  Thus,  STK15  F31I  does  not  appear  to  be  associated  with  breast  cancer  risk 
modification  in  BRCA1  or  BRCA2  mutation  carriers. 

In  parallel,  we  worked  with  a  separate  consortium  of  investigators  to  assess  the  influence  of  STK15  F31I  on 
breast  cancer  risk  in  sporadic  breast  cancer  cases.  This  consortium,  named  Breast  Cancer  Association 
Consortium  (BCAC)  is  comprised  of  18  groups  from  the  USA  and  Europe  who  are  pooling  genotyping  data  on 
various  polymorphisms  in  order  to  generate  sufficient  sample  sizes  for  clarifying  genetic  risks  associated  with 
these  polymorphisms.  We  genotyped  the  F31I  polymorphism  on  724  breast  cancer  cases  and  767  controls. 
Cases  were  collected  through  the  Mayo  Clinic  oncology  clinic  from  2002  to  2005  and  were  restricted  to 
Caucasians  from  a  6-state  region  surrounding  the  Mayo  Clinic.  Controls  were  recruited  from  Internal  Medicine 
Clinics  at  Mayo  Clinic,  had  no  previous  history  of  breast  cancer  and  were  matched  to  cases  by  age  and 
residence.  In  the  Mayo  Clinic  case-control  study  the  F31I  polymorphism  was  not  associated  with  altered  risk  of 
breast  cancer  (OR  =  1.00  (0.80-1.24))  for  heterozygotes  and  (OR  =  0.95  (0.59-1.52))  for  homozygotes. 
Similarly  when  pooled  with  data  from  five  other  centers  no  association  with  risk  was  observed  (OR  =  0.98 
(0.92-1.04))  for  homozygotes.  Stratifying  by  age  in  order  to  consider  postmenopausal  women  only  also  failed  to 
identify  any  association  with  risk.  This  report  completes  all  effort  associated  with  Tasks  1-2. 

Task  3.  In  Aim  #1  we  also  stated  that  we  would  evaluate  single  nucleotide  polymorphisms  (SNPs)  in  other 
mitotic  regulators  for  effects  on  breast  cancer  risk  in  BRCA1  and  BRCA2  mutation  carriers.  We  have  now 
completed  a  large-scale  genotyping  study  of  798  breast  cancer  cases  and  840  controls  from  the  Mayo  Clinic  for 
polymorphisms  in  genes  encoding  regulators  of  mitosis.  The  Mayo  Clinic  Breast  Cancer  study  is  an  on-going 
clinic-based  case-control  study  initiated  in  February  2001  at  Mayo  Clinic,  Rochester,  MN.  Details  of  the  study 
design  and  data  collection  procedures  have  been  previously  described  (14)  wrong  reference.  Briefly,  cases  were 
women  over  age  20  years  with  histologically  confirmed  primary  invasive  breast  carcinoma  who  were  enrolled 
within  six  months  of  date  of  diagnosis.  Controls  without  prior  history  of  cancer  (other  than  non-melanoma  skin 
cancer)  were  matched  on  age  (±  5  years)  and  region  of  residence  to  cases.  Controls  were  selected  from  the 
outpatient  clinic  in  the  Department  of  Internal  Medicine  at  Mayo  Clinic  where  they  were  seen  for  general 
medical  examinations.  Written  informed  consent  was  obtained  from  all  participants.  Case  participation  was 
69%  and  control  participation  was  71%.  The  present  analysis  genotyped  Caucasian  women  (99%  of  study 
participants)  enrolled  through  June  30,  2005,  representing  798  cases  and  843  controls.  Both  the  cases  and 
controls  completed  a  self-administered  risk  factor  questionnaire.  The  questionnaire  asked  about  known  or 
suspected  breast  cancer  risk  factors  including  lifestyle,  medical  and  reproductive  factors.  Cases  and  controls 
provided  blood  samples  from  which  genomic  DNA  was  isolated  using  standard  protocols.  The  samples  were  bar 
coded  to  ensure  accurate  and  reliable  sample  processing  and  storage. 

This  study  entailed  screening  the  dbSNP,  HapMap,  Perlegen,  Seattle  SNPs,  and  EGP  websites  for 
polymorphisms  in  273  different  mitotic  genes,  downloading  genotyping  data  and  selecting  tagged  SNPs  based 
on  the  LD  Select  program  with  a  minor  allele  frequency  (MAF)  >0.05  and  an  r  >0.80.  Coding  SNPs  were 
selected  based  on  a  change  in  amino  acid  and  a  MAF  >0.05.  SNPs  located  in  promoter  regions  and  5’  and  3’ 
UTRs  were  also  included.  A  total  of  2,400  SNPs  from  the  273  genes  were  selected  and  genotyped  on  the  cases 
and  controls  along  with  5%  duplicate  samples.  Genotyping  was  highly  successful.  Only  165  SNPs  failed 
genotyping.  Of  the  remainder,  call  rates  for  genotypes  were  greater  than  98%.  Duplicates  demonstrated  100% 
concordance.  Only  two  SNPs  were  not  in  Hardy  Weinberg  Equilibrium  (HWE)  (p<0.05). 

Individual  SNP  associations  for  breast  cancer  risk  were  assessed  using  unconditional  logistic  regression  to 
estimate  ORs  and  95%  CIs.  Primary  tests  for  associations  were  carried  out  assuming  an  ordinal  (log-additive  or 
additive)  genotypic  relationship  using  simple  tests  for  trend  within  the  logistic  and  linear  regression  models.  All 
analyses  were  adjusted  for  the  design  variables  of  age  and  region  of  residence.  We  also  examined  the  influence 
of  demographic  or  clinical  variables  and  excluded  those  variables  that  were  not  statistically  significant  at  p  > 
0.10  using  a  backward  elimination  selection  approach,  performed  separately  for  risk  and  density  analyses.  A 
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total  of  144  SNPs  displayed  significant  association  with  breast  cancer  risk.  When  accounting  for  correlations 
between  SNPs  in  the  same  genes  by  random  permutations  of  cases  and  controls  we  found  that  these  positive 
associations  represent  a  30%  increase  over  the  number  of  associations  expected  by  chance  alone.  Thus,  several 
SNPs  in  regulators  of  cell  division  may  influence  the  risk  of  breast  cancer  in  the  population. 

We  subsequently  proposed  to  extend  these  findings  into  the  BRCA1  and  BRCA2  carrier  population. 
Specifically,  we  initiated  a  study  aimed  at  evaluating  the  144  SNPs  from  genes  involved  in  regulation  of  cell 
division  as  modifiers  of  breast  cancer  risk  in  mutation  carriers.  We  selected  the  Illumina  384-SNP  goldengate 
array  as  the  most  cost-effective,  high-quality  genotyping  platform  for  this  study.  This  system  is  available  in  the 
genotyping  core  of  the  Mayo  Clinic.  To  make  full  use  of  the  384  SNP  array  we  selected  the  144  SNPs  and  also 
selected  SNPs  from  cell  division  control  genes  that  commonly  displayed  associations  with  breast  cancer  risk  in 
two  breast  cancer  genome  wide  association  studies  conducted  by  Douglas  Easton  from  Cambridge  University 
(Easton  et  al.,  2007)  and  from  the  CGEMS  group  at  NCI  (Hunter  et  al.,  2007).  Through  our  collaboration  with 
Dr.  Easton,  we  have  access  to  genotyping  data,  odds  ratios  and  p-values  for  all  12,026  SNPs  that  were  evaluated 
in  stage  2  of  the  genome  wide  study  (Easton  et  al.,  2007).  In  terms  of  CGEMS,  odds  ratios  and  p-values  for  all 
SNPs  in  Stage  1  of  the  Genome  wide  study  are  publicly  available. 

We  are  in  the  process  of  ordering  SNPs  in  Stage  2  of  the  Easton  study  and  Stage  1  of  CGEMS  by  strength  of 
association  (p-value).  All  SNPs  that  reached  both  a  significance  of  p<0.001  (approximately  700)  in  the  Easton 
study  and  a  significance  of  p<0.01  in  CGEMS  (5500)  were  selected.  Likewise,  all  SNPs  that  reached  both  a 
significance  of  p<0.001  in  CGEMS  (550)  and  a  significance  of  p<0.01  in  the  Easton  study  were  selected  (7000). 
It  is  important  to  note  that  these  studies  were  performed  on  different  platforms  and  that  as  a  result  many  of  the 
tagging  SNPs  from  the  same  genes  that  displayed  associations  with  risk  on  the  two  platforms  were  not  the  same. 
Thus,  we  mapped  the  SNPs  displaying  significant  association  with  risk  in  either  study  into  specific  haplotype 
blocks  (r2>0.6)  defined  by  HapMap  data.  Where  SNPs  from  both  studies  mapped  to  a  haplotype  block,  the 
association  was  considered  validated  and  the  SNP  in  the  haplotype  block  displaying  the  most  significant 
association  with  risk  was  selected.  Likewise  when  several  SNPs  from  a  haplotype  block  exhibited  significant 
association  with  risk  only  the  SNP  with  the  most  significant  association  was  selected.  All  of  these  SNPs  were 
assessed  for  assay  conversion  on  the  Illumina  Goldengate  system  through  consultation  with  the  Illumina 
Bioinformatics  Support  Center.  The  resulting  list  of  SNPs  that  could  be  genotyped  on  the  Goldengate  platform 
were  ordered  by  the  significance  of  the  association  with  risk.  Those  SNPs  present  in  genes  associated  with  cell 
division  were  then  selected  for  genotyping  and  were  combined  with  the  initial  144  SNPs  from  our  earlier  study 
until  384  SNPs  had  been  selected  for  the  array  study.  A  sentrix  384-bead  array  containing  these  384  SNPS  will 
shortly  be  ordered  from  Illumina  Inc. 

In  parallel,  we  requested  DNA  samples  and  risk  factor  data  from  six  collaborating  groups.  All  have  agreed  to 
participate  and  are  currently  selecting  and  aliquoting  these  DNA  samples  for  shipment  to  the  Mayo  Clinic. 
Epidemiological  risk  factor  data  matching  all  of  these  specimens  are  available  through  the  CIMBA  consortium 
database  (Couch  et  al.,  2007;  Chenevix-Trench  et  al.,  2007).  A  summary  of  the  contributions  of  BRCA1/2 
carriers  from  the  major  collaborating  centers  is  as  follows:  500  carriers  will  come  from  two  collections  at  the 
Mayo  Clinic  (Drs.  Couch  and  Szabo,  Pis);  1200  will  come  from  EMBRACE,  a  UK  collection  of  carriers 
directed  by  Dr.  Easton;  500  will  come  from  GEO-HEBON,  a  Dutch  national  collection  (Drs.  Hogervorst  and 
Rookus,  Pis);  700  will  come  from  Vienna  (Dr.  Furhauser,  PI),  500  will  come  from  the  University  of 
Pennsylvania  (Dr.  Nathanson,  PI),  500  from  a  German  National  consortium  managed  by  Dr.  Schmutzler;  800 
from  Australia  (Dr.  Spurdle,  PI).  Only  BRCA1  or  BRCA2  female  breast  cancer  cases  or  unaffected  individuals 
are  included.  A  total  of  2,000  BRCA1  mutation  carriers  (1,000  affected  with  breast  cancer  and  1,000  unaffected) 
and  2,000  BRCA2  mutation  carriers  (1,000  affected  with  breast  cancer  and  1,000  unaffected)  will  be  used  for 
genotyping.  We  computed  the  statistical  power  of  the  study  to  account  for  multiple  testing  at  a  significance 
level  of  0.05/384  (~10~4).  At  this  level  of  significance  and  when  genotyping  4,000  carriers  the  study  has  80% 
power  to  detect  a  risk  ratio  of  1.3  for  a  SNP  of  MAF>0.20.  Thus,  the  study  is  adequately  powered  to  detect 
associations  with  small  effect  sizes. 

Once  these  samples  arrive  (October  2007),  they  will  be  aliquoted  at  50ng/ul  into  96  well  plates  with  2% 
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duplicates  and  four  controls  (one  water  control  and  a  CEPH  trio).  The  samples  will  then  be  genotyped  on  the 
384-array  in  the  Mayo  Clinic  genotyping  center.  Genotyping  data  will  be  assessed  for  samples  and  SNP  call 
rates.  SNPs  displaying  call  rates  <95%  will  be  excluded.  Tests  of  Hardy  Weinberg  equilibrium  (HWE)  for 
BRCA1  carriers  will  be  performed  and  SNPs  with  p<0.001  will  be  excluded.  Our  primary  analysis  will  be  to 
evaluate  the  association  of  each  SNP  individually  with  breast  cancer  using  a  weighted  Cox  proportional  hazards 
model  that  measures  time  to  disease  diagnosis  and  incorporates  information  on  both  disease  status  and  age. 
Analyses  will  be  adjusted  for  study  center  and  geographical  region  to  allow  for  differences  in  disease  risks  and 
allele  frequencies.  A  robust  variance  estimation  approach  will  be  used  to  allow  for  more  than  one  carrier  from 
the  same  family.  We  will  account  for  prophylactic  oophorectomy  in  the  analyses  because  oophorectomy  is 
known  to  reduce  risk  of  breast  cancer  by  up  to  50%  in  BRCA1  and  BRCA2  mutation  carriers  (Rebbeck  et  al., 
2002).  Adjustment  for  other  risk  factors  will  also  be  performed. 

At  the  conclusion  of  the  study,  in  early  to  mid  2008,  we  expect  to  have  identified  a  number  of  novel  modifiers 
of  breast  cancer  risk  in  BRCA1  and  BRCA2  carriers.  These  modifiers  will  prove  useful  for  identifying  carriers 
who  are  at  lower  risk  of  cancer  compared  to  all  carriers  and  may  benefit  from  a  watchful  waiting  approach  to 
cancer  prevention  as  opposed  to  invasive  prophylactic  oophorectomy  and  mastectomy. 

Aim  2.  To  demonstrate  that  Val57Ile  alters  STK15  function  and  co-operates  with  BRCA1/2 
mutations  to  disrupt  mitotic  regulation. 

Task  5  ands  6.  When  we  began  this  study,  the  F31I  variant  had  already  been  shown  to  alter  the  activity  of  the 
Aurora  box-1  of  STK15  protein,  resulting  in  disruption  of  p53  binding  and  a  decreased  rate  of  degradation  of 
STK15  (Ewart-Toland  et  al.,  2003).  It  had  also  been  shown  by  others  that  stabilized  STK15  was  associated  with 
centrosome  amplification  and  failure  of  cytokinesis,  increased  chromosomal  instability  and  aneuploidy, 
suggesting  a  direct  effect  of  the  F31I  variant  on  promotion  of  tumor  formation  (Ewart-Toland  et  al.,  2003).  As  a 
result,  Tasks  5  and  6  were  deemed  to  be  complete.  Our  subsequent  finding  that  F31I  does  not  influence  breast 
cancer  risk  in  BRCA1  and  BRCA2  carriers  suggests  that  these  effects  of  STK15  stability  make  no  contribution  to 
cancer  risk. 

Aim  3.  To  establish  the  involvement  of  STK15  in  breast  tumor  formation  using  Val57  and  Ile57-STK15 
transgenic  mice  and  to  evaluate  synergism  with  BRCA1/2  by  intercrossing  with  conditional  brcal  and  brca2 
mutant  mouse  models. 

Tasks  7-9.  As  noted  above,  neither  STK15  F31I  or  V57I  are  associated  with  increased  risk  of  breast  cancer.  On 
the  basis  of  this  finding  we  felt  that  it  was  inappropriate  to  continue  with  the  proposed  generation  of  transgenic 
animals  expressing  these  mutant  forms  of  STK15  in  order  to  assess  their  infleunce  on  breast  cancer 
development  in  vivo.  Instead,  we  focused  our  efforts  on  Task  3  and  4  in  an  effort  to  identify  variants  in  other 
mitotic  regulators  that  modify  the  risk  of  breast  cancer  in  BRCA1  and  BRCA2  carriers. 

Key  Research  Accomplishments 

•  The  F31I  and  V57I  polymorphisms  in  STK15  are  not  associated  with  modification  of  breast  cancer  risk 
in  BRCA1  and  BRCA2  carriers. 

•  The  F31I  polymorphism  in  STK15  is  not  associated  with  breast  cancer  risk  in  a  series  of  case-control 
studies. 

•  Common  genetic  variants  in  genes  encoding  mitotic  regulators  are  associated  with  altered  risk  of  breast 
cancer  in  a  breast  cancer  case-control  study. 

Reportable  Outcomes 

1.  The  Breast  Cancer  Association  Consortium.  Commonly  studied  single  nucleotide  polymorphisms  and  breast 
cancer:  results  from  the  Breast  Cancer  Association  Consortium.  JNCI.  98:1382-1396,  2006. 

2.  Chenevix-Trench  G,  Milne  RL,  Antoniou  AC,  Couch  FJ,  Easton  DF,  Goldgar  DE;  CIMBA.  An 
international  initiative  to  identify  genetic  modifiers  of  cancer  risk  in  BRCA1  and  BRCA2  mutation  carriers: 
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the  Consortium  of  Investigators  of  Modifiers  of  BRCA1  and  BRCA2  (CIMBA).  Breast  Cancer  Res.  9:104 
[Epub],  2007. 

3.  Couch  FJ,  Sinilnikova  O,  Vierkant  RA,  Pankratz  VS,  Fredericksen  ZS,  Stoppa-Fyonnet  D,  Coupier  I, 
Hughes  D,  Hardouin  A,  Berthet  P,  Peock  S,  Cook  M,  Baynes  C,  Hodgson  S,  Morrison  PJ,  Porteous  ME, 
Jakubowska  A,  Fubinski  J,  Gronwald  J,  Spurdle  AB;  kConFab,  Schmutzler  R,  Versmold  B,  Engel  C, 
Meindl  A,  Sutter  C,  Horst  J,  Schaefer  D,  Offit  K,  Kirchhoff  T,  Andrulis  IF,  Ilyushik  E,  Glendon  G,  Devilee 
P,  Vreeswijk  MP,  Vasen  HF,  Borg  A,  Backenhorn  K,  Struewing  JP,  Greene  MH,  Neuhausen  SF,  Rebbeck 
TR,  Nathanson  K,  Domchek  S,  Wagner  T,  Garber  JE,  Szabo  C,  Zikan  M,  Foretova  F,  Olson  JE,  Sellers  TA, 
Findor  N,  Nevanlinna  H,  Tommiska  J,  Aittomaki  K,  Hamann  U,  Rashid  MU,  Torres  D,  Simard  J,  Durocher 
F,  Guenard  F,  Fynch  HT,  Isaacs  C,  Weitzel  J,  Olopade  01,  Narod  S,  Daly  MB,  Godwin  AK,  Tomlinson  G, 
Easton  DF,  Chenevix-Trench  G,  Antoniou  AC;  on  behalf  of  the  Consortium  of  Investigators  of  Modifiers  of 
BRCA1/2.  AURKA  F31I  Polymorphism  and  Breast  Cancer  Risk  in  BRCA1  and  BRCA2  Mutation  Carriers: 
the  Consortium  of  Investigators  of  Modifiers  of  BRCA1  and  BRCA2  (CIMBA).  Cancer  Epi  Bio  &  Prev. 
16:1416-1421,2007. 

Conclusions 

We  have  used  very  large  datasets  to  demonstrate  that  the  F31I  polymorphism  in  STK15  does  not  increase  the 
risk  of  breast  cancer  in  carriers  of  BRCA1  and  BRCA2  mutations.  A  similar  lack  of  effect  was  seen  in  another 
very  large  pooled  dataset  from  sporadic  breast  cancer  case-control  studies.  However,  it  is  likely  that 
polymorphisms  in  other  mitotic  regulators  alter  breast  cancer  risk  in  sporadic  and  familial  breast  cancer  patients. 
We  have  evaluated  a  number  of  such  variants  in  a  breast  cancer  case-control  study  and  have  initiated  a  study 
aimed  at  validating  these  findings  in  BRCA1  and  BRCA2  carriers.  Specifically,  we  are  gathering  DNA  and  risk 
factor  data  from  4,000  BRCA1  and  BRCA2  mutation  carriers  from  six  collaborating  centers.  Once  these  samples 
are  in  hand  (expected  by  October  2007),  the  variants  will  be  genotyped  on  a  384-SNP  Goldgengate  array  and 
assessed  for  breast  cancer  risk  modification  in  BRCA1  and  BRCA2  carriers. 

This  work  (Tasks  3  and  4)  is  not  complete  so  we  have  filed  for  a  no-cost  extension  of  the  project  until  5-20- 
2008. 
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Abstract 
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with  breast  cancer  risk  in  the  homozygous  state  in  prior 
studies.  We  evaluated  whether  the  AURKA  F31I  polymor¬ 
phism  modifies  breast  cancer  risk  in  BRCA1  and  BRCA2 
mutation  carriers  from  the  Consortium  of  Investigators  of 
Modifiers  of  BRCA1/2.  Consortium  of  Investigators  of 
Modifiers  of  BRCA1/2  was  established  to  provide  sufficient 
statistical  power  through  increased  numbers  of  mutation 
carriers  to  identify  polymorphisms  that  act  as  modifiers  of 
cancer  risk  and  can  refine  breast  cancer  risk  estimates  in 
BRCA1  and  BRCA2  mutation  carriers.  A  total  of  4,935  BRCA1 
and  2,241  BRCA2  mutation  carriers  and  11  individuals 
carrying  both  BRCA1  and  BRCA2  mutations  was  genotyped 


for  F31I.  Overall,  homozygosity  for  the  311  allele  was  not 
significantly  associated  with  breast  cancer  risk  in  BRCA1 
and  BRCA2  carriers  combined  [hazard  ratio  (HR),  0.91;  95% 
confidence  interval  (95%  Cl),  0.77-1.06].  Similarly,  no 
significant  association  was  seen  in  BRCA1  (HR,  0.90;  95% 
Cl,  0.75-1.08)  or  BRCA2  carriers  (HR,  0.93;  95%  Cl,  0.67-1.29) 
or  when  assessing  the  modifying  effects  of  either  bilateral 
prophylactic  oophorectomy  or  menopausal  status  of  BRCA1 
and  BRCA2  carriers.  In  summary,  the  F31I  polymorphism  in 
AURKA  is  not  associated  with  a  modified  risk  of  breast 
cancer  in  BRCA1  and  BRCA2  carriers.  (Cancer  Epidemiol 
Biomarkers  Prev  2007;16(7):1416-21) 


Introduction 

The  AURORA- A/ AURKA/BTAK/STK15  gene  encodes  a  serine/ 
threonine  kinase  that  regulates  mitotic  chromosome  segrega¬ 
tion.  AURKA  is  amplified  and  overexpressed  in  breast  and  other 
tumors  and  is  associated  with  centrosome  amplification,  failure 
of  cytokinesis,  and  aneuploidy.  Genetic  mapping  studies  in 
mouse  models  suggest  that  AURKA  is  a  genetic  modifier  of 
cancer  risk  (1).  In  addition,  mouse  models  of  AUR7KA  exhibit 
infrequent  mammary  gland  tumor  formation  but  display 
synergy  in  tumor  formation  when  combined  with  overex¬ 
pressed  oncogenes  or  disrupted  tumor  suppressors,  suggesting 
that  AURKA  is  a  low-risk  cancer  susceptibility  gene  (2). 

Further  evidence  for  a  role  of  AURKA  in  breast  cancer  comes 
from  observations  that  homozygosity  for  a  F31I  polymorphism 
in  AURKA  is  associated  with  an  increased  risk  for  breast 
cancer.  In  a  study  of  incident  breast  cancer  cases  ( n  =  941)  and 
age-matched  population  controls  («  =  830),  Egan  et  al.  (3)  found 
that  the  breast  cancer  risk  for  Ile/Ile  homozygotes  were  at 
increased  risk  for  breast  cancer  [odds  ratio  (OR),  1.54;  95% 
confidence  interval  (95%  Cl),  0.96-2.47],  although  this  finding 
was  not  significant.  Sun  et  al.  (4)  observed  that  the  lie-encoding 
allele  is  the  common  allele  in  the  Chinese  population,  whereas 
the  Phe-encoding  allele  is  more  common  in  Caucasian 
populations  (4).  In  addition,  an  association  between  Ile/Ile 
homozygotes  and  estrogen  receptor -negative  breast  carcino¬ 
mas  (OR,  2.56;  95%  Cl,  1.24-5.26)  was  detected.  Lo  et  al.  (5) 
reported  a  significant  association  between  AURKA  haplotypes 
and  breast  cancer  risk.  Ewart-Toland  et  al.  (6)  also  found  an 
increase  in  cancer  risk  for  the  Ile/Ile  homozygotes  (OR,  1.35; 
95%  Cl,  1.12-1.64;  P  =  0.002)  in  a  meta-analysis  of  data  from 
four  case-control  breast  cancer  populations.  Furthermore, 
postmenopausal  women  homozygous  for  the  F31I  and  I57V 
alleles  of  AURKA  in  a  case-control  study  nested  within  the 
Nurses'  Health  Study  prospective  cohort  had  an  increased  risk 
of  invasive  breast  cancer  (OR,  1.63;  95%  Cl,  1.08-2.45;  ref.  7).  In 
contrast,  Dai  et  al.  (8)  did  not  observe  a  significant  association 
with  breast  cancer  risk  for  Ile/Ile  homozygotes  (OR,  1.2;  95% 
Cl,  0.9-1. 6)  in  a  population-based  case-control  series  of  Han 
Chinese,  and  Fletcher  et  al.  (9)  found  no  association  between 
Ile/Ile  homozygotes  and  risk  of  bilateral  breast  cancer  (OR, 
0.63;  95%  Cl,  0.34-1.13).  Importantly,  the  F31I  variant  has  been 
shown  to  alter  the  activity  of  the  Aurora  box-1  of  the  AURKA 
protein,  resulting  in  disruption  of  p53  binding  and  a  decreased 
rate  of  degradation  of  AURKA.  The  stabilized  AURKA  may 
lead  to  centrosome  amplification  and  failure  of  cytokinesis, 
increased  chromosomal  instability  and  aneuploidy,  and  pro¬ 
motion  of  tumor  formation  (1). 

Mutations  in  BRCA1  and  BRCA2  are  correlated  with 
aberrant  duplication  of  the  centrosome  leading  to  centrosome 
amplification,  chromosome  missegregation,  and  aneuploidy 
(10-12).  Amplification  of  AURKA  has  also  been  detected  at 
much  higher  frequency  in  tumors  from  BRCA1  and  BRCA2 
mutation  carriers  than  in  sporadic  breast  tumors,  suggesting 
that  overexpression  of  AURKA  and  inactivation  of  BRCA1  and 
BRCA2  cooperate  during  tumor  development  and/or  progres¬ 


sion.  Based  on  these  data,  we  hypothesized  that  the  F31I 
polymorphism  modifies  the  risk  of  breast  cancer  in  BRCA1  and 
BRCA2  mutation  carriers.  To  address  this  hypothesis,  AURKA 
F31I  was  genotyped  on  BRCA1  and  BRCA2  deleterious 
mutation  carriers  from  16  clinic  and  population-based  research 
studies  and  multicenter  consortia  participating  in  the  Consor¬ 
tium  of  Investigators  of  Modifiers  of  BRCA1/2  (CIMBA)  and 
the  association  of  F31I  with  breast  cancer  risk  was  assessed. 


Materials  and  Methods 

Subjects.  BRCA1  and  BRCA2  mutation  carriers  were 
identified  through  16  clinic  and  population-based  research 
studies  and  multicenter  consortia  participating  in  the  CIMBA. 
This  international  consortium  was  established  in  2005  by  a 
group  of  investigators  interested  in  identifying  modifiers  of 
cancer  risk  in  BRCA1  and  BRCA2  mutation  carriers  that  could 
be  used  to  refine  cancer  risk  estimates.  Recruitment  of 
mutation  carriers  for  this  and  other  CIMBA  studies  was 
approved  by  institutional  review  boards  or  ethics  committees 
at  all  sites.  BRCA1  and  BRCA2  mutation  carriers  were  defined 
as  carriers  of  frameshifting  small  deletions  and  insertions, 
nonsense  mutations,  splice  site  mutations  verified  in  vitro,  and 
large  genomic  rearrangements  that  result  in  a  premature  stop 
codon  in  either  BRCA1  or  BRCA2.  These  mutations  were 
identified  by  a  variety  of  screening  techniques  and  sequence 
verified.  As  the  K3326X  variant  in  exon  27  is  not  associated 
with  high  risk  of  breast  cancer,  this  and  other  mutations 
causing  stop  codons  in  exon  27  were  excluded.  Missense 
mutations  that  have  been  classified  as  pathogenic  by  multi¬ 
factorial  likelihood  approaches  were  included  in  the  deleteri¬ 
ous  category  (12-14),  whereas  carriers  of  all  other  missense  and 
intronic  mutations  in  BRCA1  and  BRCA2  were  excluded  from 
the  study.  Phenotypic  data  for  mutation  carriers  were 
provided  by  each  contributing  center.  Data  were  collected  on 
year  of  birth,  mutation  description,  ethnicity,  country  of 
residence,  age  at  last  follow-up,  ages  at  breast  and  ovarian 
cancer  diagnosis,  age  at  bilateral  prophylactic  mastectomy,  age 
at  bilateral  prophylactic  oophorectomy,  and  status  and  age  at 
menopause.  These  and  other  available  epidemiologic  data 
obtained  from  risk  factor  questionnaires  and/or  medical 
records  were  uniformly  coded  and  stored  in  a  centralized 
CIMBA  database. 

Genotyping.  The  F31I  polymorphism  (rs2273535)  of 
AURKA  was  genotyped  by  13  groups  by  the  5'  nuclease  assay 
(Taqman)  on  an  ABI  7900HT  Sequence  Detection  System 
(Applied  Biosystems).  PCR  primers  were  5'-CTGGCCAC- 
TATTTACAGGTAATGGA-3'  (forward)  and  5'-TGGAGGTC- 
CAAAACGTGTTCTC-3'  (reverse).  Probes  were  VIC-ACTCA- 
GCAATTTCCTT  and  FAM-CTCAGCAAATTCCTT.  The 
annealing  temperature  was  60°C.  Lund  investigators  used 
an  alternative  reverse  primer  (CATCTTTTGCTTTCATGA- 
ATGCCAG)  and  did  the  5'  nuclease  assay  on  a  RotorGene 
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Table  1.  Characteristics  of  study  subjects  by  site 


Source 

Ascertainment 

BRCA1 

cases 

BRCA1 

unaff.* 

Total 

BRCA1 

BRCA2 

cases 

BRCA2 

unaff. 

Total 

BRCA2 

B1H  + 
cases 

Bl/2 

unaff. 

Total 

Bl/2 

Total 

carriers 

MAGIC 

Clinic 

303 

428 

731 

137 

160 

297 

3 

0 

3 

1,031 

GEMO 

Clinic 

413 

276 

689 

223 

84 

307 

0 

0 

0 

996 

EMBRACE 

Clinic 

235 

219 

454 

156 

148 

304 

1 

2 

3 

761 

Poland 

Clinic 

307 

427 

734 

0 

0 

0 

0 

0 

0 

734 

kConFab 

Clinic 

203 

201 

404 

169 

143 

312 

0 

0 

0 

716 

GCHBOC 

Clinic 

286 

113 

399 

173 

52 

225 

3 

0 

3 

627 

MSKCC 

Clinic 

174 

117 

291 

102 

70 

172 

1 

0 

1 

464 

Ontario 

Clinic  and 
population 

125 

52 

177 

100 

41 

141 

0 

0 

0 

318 

LUMC 

Clinic 

99 

120 

219 

12 

20 

32 

0 

0 

0 

251 

Lund 

Clinic 

73 

88 

161 

38 

32 

70 

0 

0 

0 

231 

MOD-SQUAD 

Clinic 

82 

67 

149 

28 

15 

43 

0 

0 

0 

192 

HEBCS 

Clinic 

56 

39 

95 

54 

40 

94 

0 

0 

0 

189 

DKFZ 

Clinic 

82 

41 

123 

30 

21 

51 

0 

0 

0 

174 

MAYO 

Clinic 

53 

23 

76 

26 

20 

46 

0 

0 

0 

122 

INHERIT 

Clinic 

33 

37 

70 

40 

41 

81 

0 

0 

0 

151 

NCI 

Clinic 

47 

116 

163 

17 

50 

67 

0 

0 

0 

230 

Total 

2,571 

2,364 

4,935 

1,305 

937 

2,242 

8 

2 

10 

7,187 

Abbreviations:  MAGIC,  Modifiers  and  Genetics  in  Cancer;  GEMO,  Genetic  Modifiers  of  cancer  risk  in  BRCA1/2  mutation  carriers  study;  GCHBOC,  German 
Consortium  for  Hereditary  Breast  and  Ovarian  Cancer;  EMBRACE,  Epidemiological  Study  of  BRCA1  and  BRCA2  Mutation  Carriers;  kConFab,  Kathleen  Cunningham 
Consortium  for  Research  into  Familial  Breast  Cancer;  INHERIT  BRCAs,  Interdisciplinary  Health  Research  International  Team  on  Breast  Cancer  susceptibility; 
MSKCC,  Memorial  Sloan-Kettering  Cancer  Center;  MAYO,  Mayo  Clinic;  LUMC,  Leiden  University  Medical  Center;  MOD-SQUAD,  Modifier  Study  of  Quantitative 
Effects  on  Disease;  HEBCS,  Helsinki  Breast  Cancer  Study;  DKFZ,  Deutsches  Krebsforschungszentrum  Heidelberg;  NCI,  National  Cancer  Institute. 

*The  term  unaff.  refers  to  individuals  not  affected  with  breast  cancer. 

tB2/2  refers  to  individuals  with  both  BRCA1  and  BRCA2  deleterious  mutations. 


(Corbett  Research).  INHERIT  investigators  directly  sequenced 
the  polymorphism  using  the  following  primers:  5'-GGGTG- 
AGGAATTGGAGGGGAT-3'  (forward)  and  5'-GGACACCA- 
ATTTATGCTGTGTCCT-3'  (reverse).  Genotyping  for  the 
HEBCS  was  done  by  Amplifluor  fluorescent  genotyping 
(KBioscience).48  Genotyping  for  the  DKFZ  and  Polish  studies 
was  done  by  fragment  analysis.  DNA  fragments  containing 
the  polymorphism  were  amplified  using  forward  primer 
5'-AGTTGGAGGTCCAAAACGTG-3'  and  Cy5-labeled  reverse 
primer  5'-CGCTGGGAAGTATTTGAAGG-3',  digested  with 
2.5  units  Xap I  (Fermentas),  separated  on  3%  agarose  gel 
(Polish  samples)  or  by  capillary  gel  electrophoresis  (German 
samples)  on  a  CEQ  8000  DNA  Analysis  System  (Beckmann), 
and  sized  relative  to  CEQ  DNA  Size  Standard-400  in  each 
well.  Allele  sizes  were  114  bp  for  the  T  allele  and  78  bp  for  the 
A  allele. 

Statistical  Methods.  Hazard  ratios  (HR)  were  modeled 
using  Cox  proportional  hazards  regression  analysis,  with 
breast  cancer  as  the  outcome  and  age  as  the  time  variable 
(15).  We  corrected  for  possible  ascertainment  bias  using  a 
weighted  cohort  approach  (16).  Briefly,  this  involves  assigning 
weights  to  the  mutation-carrying  subjects  such  that  the 
reweighted  incidence  rates  observed  in  the  study  sample  are 
consistent  with  the  age-dependent  penetrances  for  breast 
cancer  onset  established  in  carriers  of  inactivating  mutations 
in  BRCA1  and  BRCA2.  Subjects  were  followed  from  birth  until 
the  earliest  occurrence  of  breast  cancer  (3,884),  bilateral 
prophylactic  mastectomy  (232),  ovarian  cancer  (643),  age  80 
(97),  or  age  at  last  contact  (2,331).  Subjects  were  censored  at  age 
80  because  population-based  incidence  rates  for  older  muta¬ 
tion  carriers  are  unreliable,  and  accurate  sampling  weights 
cannot  be  assigned.  Carriers  with  both  BRCA1  and  BRCA2 
mutations  were  included  once  in  overall  analyses  and  were 
also  included  in  each  of  the  BRCA1  and  BRCA2  gene-specific 
analyses.  The  number  of  subjects  in  each  family  varied  from 
1  to  33,  with  75%  of  families  represented  by  a  single  individual. 
Because  the  exact  relationships  among  the  family  members 
were  not  available,  we  accounted  for  the  nonindependence  of 
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observations  within  families  using  a  robust  variance  estimate 
(17).  Primary  analyses  modeled  AURKA  as  a  recessive  effect, 
comparing  those  with  two  copies  of  the  minor  allele  with  those 
with  less  than  two  copies.  Secondary  analyses  examined 
associations  using  a  two  degree-of-freedom  general  model, 
simultaneously  comparing  subjects  with  one  copy  or  with  two 
copies  of  the  minor  allele  with  the  subjects  with  zero  copies. 

Overall  analyses  were  carried  out  for  all  subjects  regardless 
of  whether  they  carried  a  mutation  in  BRCA1  or  BRCA2  or 
both.  All  analyses  accounted  for  birth  cohort  and  country  of 
residence  by  including  them  as  stratification  variables  in  the 
Cox  regression.  The  overall  analysis  also  accounted  for  study 
site  and  mutation  status.  Additional  analyses  were  conducted 
to  obtain  risk  estimates  for  individuals  with  different  charac¬ 
teristics,  as  defined  by  gene  status,  menopausal  status, 
oophorectomy  status,  and  study  site.  Gene-specific  results 
accounted  for  study  site  along  with  birth  cohort  and  country  of 
residence  by  use  of  stratification  variables.  Site-specific  results 
accounted  for  mutation  status,  birth  cohort,  and  country  of 
residence.  Menopausal  status  and  oophorectomy  status  were 
modeled  as  time-dependent  covariates  and  results  accounted 
for  group  status  and  mutation  status.  In  secondary  analyses, 
the  influence  of  benign  prophylactic  oophorectomy  and 
menopausal  status  on  associations  between  the  Ile/Ile  geno¬ 
type  and  breast  cancer  risk  was  also  evaluated.  As  these 
covariates  did  not  confound  the  observed  associations,  the 
associations  reported  in  Table  2  are  not  adjusted  for  these 
variables. 

Among  those  who  provided  ethnicity  information,  97% 
were  Caucasian,  2%  were  Ashkenazi  Jewish,  and  the  remain¬ 
ing  1%  were  "other."  Those  who  did  not  provide  ethnicity 
information  were  grouped  in  a  separate  "missing"  category  for 
analysis  purposes.  Ethnicity  was  initially  included  as  an 
additional  stratification  variable  but  was  subsequently  exclud¬ 
ed  because  of  the  absence  of  any  effect  on  the  results.  We 
assessed  the  possible  heterogeneity  of  risk  ratios  across  study 
site  using  standard  tests  of  interaction.  A  sensitivity  analysis 
assessing  the  effect  of  possible  survival  bias  was  conducted  by 
excluding  cases  ascertained  more  than  3  years  after  diagnosis. 
All  statistical  tests  were  two  sided,  and  all  analyses  were 
carried  out  using  the  Statistical  Analysis  System  (SAS  Institute, 
Inc.)  and  S-Plus  (Insightful)  software  systems. 
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Results 

A  total  of  4,935  female  BRCA1 ,  2,241  female  BRCA2  deleterious 
mutation  carriers,  and  11  individuals  carrying  both  BRCA1  and 
BRCA2  mutations  was  included  in  this  study.  Of  these  7,187 
mutation  carriers,  3,884  had  a  diagnosis  of  breast  cancer  at  the  end 
of  follow-up  and  3,303  were  censored  as  unaffected  at  a  mean  age 
of  43.4  years.  The  distribution  of  BRCA1  and  BRCA2  carriers  by 
study  site,  gene,  and  cancer  status  is  shown  in  Table  1.  To  avoid 
overlap  between  studies,  we  compared  carriers  by  country  of 
origin,  year  of  birth,  mutation,  and  reported  ages.  Duplication  of 
samples  between  MAYO  and  MAGIC  and  between  GEMO  and 
MAGIC  was  detected.  In  both  instances,  the  duplicated  samples 
were  excluded  from  the  MAGIC  data  set. 

The  distribution  of  the  AURKA  F31I  genotypes  is  shown  in 
Table  2.  Of  the  363  (5%)  carriers  homozygous  for  the  Ile- 
encoding  allele,  188  were  affected  with  breast  cancer.  The  fre¬ 
quency  of  the  recessive  Ile/Ile-encoding  genotype  in  the 
16  groups  varied  between  3%  and  8%,  which  is  similar  to  esti¬ 
mates  from  other  populations  (6).  There  was  no  difference  in  the 
frequency  of  the  Ile/Ile  recessive  genotype  across  genotyping 
platforms  ( P  =  0.33).  Similarly,  the  study  sites  with  the  highest 
Ile/Ile  frequencies  did  not  have  ethnic  mixtures  significantly 
different  to  the  other  study  sites.  The  F31I  polymorphism  did 
not  deviate  significantly  from  Hardy-Weinberg  equilibrium 
(P  =  0.07)  among  all  7,187  affected  and  unaffected  carriers. 

The  estimated  risk  of  breast  cancer  associated  with  the 
recessive  genotype  for  F31I  in  BRCA1  and  BRCA2  carriers 
using  a  weighted  Cox  proportional  hazards  model  is  shown  in 
Table  2.  Although  there  was  a  suggestion  of  a  protective  effect 
(HR,  0.91;  95%  Cl,  0.77-1.06),  overall,  the  result  was  not 
statistically  significant.  Similarly,  no  association  with  risk  was 
observed  for  individual  participating  centers  other  than  for 
two  centers  (Ontario  and  HEBCS)  that  contributed  small 


numbers  of  carriers  to  the  study  (Table  2).  A  test  for 
heterogeneity  across  study  site  was  not  significant  (P  =  0.06). 
In  an  effort  to  account  for  the  trend  toward  heterogeneity,  we 
investigated  the  influence  of  the  three  sites  that  were 
significantly  different  from  the  other  sites  [MOD-SQUAD 
(P  =  0.02),  GEMO  (P  =  0.01),  and  DKFZ  (P  =  0.03)]  on  the 
overall  effect.  Exclusion  of  each  site  in  turn  did  not  substan¬ 
tially  alter  the  overall  HR  or  the  significance  of  the  association. 

Because  BRCA1  is  phosphorylated  by  AURKA  (18),  we 
evaluated  whether  the  Ile/Ile  genotype  was  associated  with 
risk  of  breast  cancer  in  BRCA1  or  BRCA2  carriers.  No 
significant  association  with  risk  was  detected  for  either  BRCA1 
(HR,  0.90;  95%  Cl,  0.75-1.08)  or  BRCA2  carriers  (HR,  0.93;  95% 
Cl,  0.67-1.29;  Table  2).  As  other  studies  have  reported  an 
association  between  the  recessive  Ile/Ile-encoding  genotype 
and  postmenopausal  status  in  noncarriers  (3,  7),  we  considered 
the  influence  of  menopausal  status  of  carriers  on  breast  cancer 
risk.  At  the  end  of  follow-up,  4,201  carriers  were  premeno¬ 
pausal  and  2,986  were  postmenopausal.  No  significant 
association  with  risk  was  detected  (Table  2).  Because  prophy¬ 
lactic  oophorectomy  substantially  reduces  the  risk  of  breast 
cancer  in  BRCA1  and  BRCA2  mutation  carriers  (19),  we  also 
evaluated  the  influence  of  prophylactic  oophorectomy  status. 
A  total  of  707  individuals  reported  undergoing  prophylactic 
oophorectomy,  4,298  reported  no  history  of  oophorectomy, 
whereas  2,182  (30%)  provided  no  data  at  last  follow-up. 
Associations  with  breast  cancer  risk  by  category  of  prophy¬ 
lactic  oophorectomy  did  not  differ  markedly  from  the  overall 
results.  Secondary  analyses  using  a  two  degree-of-freedom 
general  model  also  failed  to  detect  a  significant  association  for 
either  a  single  copy  (P  =  0.97)  or  two  copies  (P  =  0.24)  of  the 
F31I  polymorphism  compared  with  no  copies. 

In  an  effort  to  account  for  possible  survival  bias  and  the 
inclusion  of  prevalent  cases  in  the  collection  of  BRCA1  and 


Table  2.  Association  of  AURKA  F31I  with  breast  cancer  risk 


Group 

0  or  1 

copy  lie  allele 

2 

copies  lie  allele 

HR  (95%  Cl), 
all  cases 

HR  (95%  Cl)/ 
incident  cases 

Unaffected 

Affected 

Person- 

years 

Unaffected 

Affected 

Person- 

years 

Overall 

3,128 

3,696 

296,122 

175 

188 

15,793 

0.91  (0.77-1.06) 

0.84  (0.65-1.08) 

By  mutation  status 

BRCA1 

2,237 

2,460 

200,406 

129 

120 

10,754 

0.90  (0.75-1.08) 

0.90  (0.66-1.22) 

BRCA2 

893 

1,245 

96,110 

46 

68 

5,039 

0.93  (0.67-1.29) 

0.67  (0.44-1.03) 

By  menopausal  status 

Premenopausal 

1,935 

2,049 

242,208 

111 

106 

12,834 

0.84  (0.69-1.03) 

0.83  (0.60-1.15) 

Postmenopausal 

1,193 

1,647 

53,914 

64 

82 

2,959 

0.96  (0.75-1.23) 

0.77  (0.51-1.16) 

By  oophorectomy  status 

No 

1,772 

2,318 

201,303 

101 

107 

10,474 

0.85  (0.69-1.05) 

0.82  (0.58-1.15) 

Yes 

510 

160 

3,793 

28 

9 

213 

1.10  (0.56-2.18) 

1.03  (0.39-2.78) 

Missing 

846 

1,218 

91,026 

46 

72 

5,106 

0.97  (0.75-1.26) 

0.86  (0.55-1.34) 

By  study  site 

MAGIC 

559 

423 

41,554 

29 

20 

2,002 

1.02  (0.63-1.67) 

GEMO 

347 

597 

40,913 

13 

39 

2,266 

1.33  (0.97-1.82) 

EMBRACE 

353 

378 

30,757 

16 

14 

1,318 

0.70  (0.37-1.32) 

Poland 

399 

285 

30,360 

28 

22 

2,197 

0.98  (0.65-1.47) 

kConFab 

322 

362 

29,568 

22 

10 

1,251 

0.64  (0.34-1.22) 

GCHBOC 

157 

432 

24,819 

8 

30 

1,698 

0.94  (0.65-1.37) 

MSKCC 

182 

268 

19,371 

5 

9 

591 

0.79  (0.38-1.66) 

Ontario 

79 

217 

13,069 

14 

8 

1,012 

0.33  (0.13-0.82) 

LUMC 

129 

106 

10,350 

11 

5 

715 

0.68  (0.32-1.44) 

Lund 

113 

102 

11,401 

7 

9 

803 

1.05  (0.55-1.99) 

MOD-SQUAD 

78 

104 

7,760 

4 

6 

388 

1.56  (1.04-2.36) 

HEBCS 

75 

108 

8,451 

4 

2 

344 

0.27  (0.05-1.96) 

DKFZ 

61 

110 

6,714 

1 

2 

109 

7.05  (0.66-75.2) 

MAYO 

41 

71 

4,998 

2 

8 

442 

1.41  (0.65-3.07) 

INHERIT 

76 

70 

6,668 

2 

3 

225 

1.29  (0.45-3.67) 

NCI 

157 

63 

9,371 

9 

1 

433 

0.28  (0.05-1.77) 

NOTE:  Weighted  Cox  proportional  hazards  regression  analysis,  modeling  AURKA  F31I  as  a  recessive  genotypic  effect.  Results  overall  by  menopausal  status  and  by 
oophorectomy  status  account  for  birth  cohort,  group  status,  country,  and  mutation  status.  Mutation-specific  results  account  for  birth  cohort,  group  status,  and  country. 
Group-specific  results  account  for  birth  cohort,  mutation  status,  and  country.  Robust  variance  estimates  were  used  to  correct  for  possible  nonindependence  of  study 
subjects. 

*Cox  proportional  hazards  regression  analysis  restricted  to  cases  for  whom  genetic  diagnosis  is  less  than  3  y  after  breast  cancer  diagnosis. 
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BRCA2  carriers,  we  repeated  our  analysis  after  excluding  cases 
diagnosed  more  than  3  years  before  the  date  of  ascertainment. 
For  this  analysis,  we  excluded  records  where  an  age  at 
interview  was  not  provided.  Overall,  the  mean  difference 
between  age  of  diagnosis  and  age  at  interview  for  the  3,422 
cases  with  available  data  was  8.7  years.  Of  these,  1,322  (38.6%) 
cases  had  been  diagnosed  less  than  3  years  before  the  date  of 
ascertainment.  When  excluding  prevalent  cases,  no  association 
between  the  He /lie  genotype  and  breast  cancer  risk  was 
observed,  and  the  risk  estimates  were  similar  to  those  obtained 
when  using  both  prevalent  and  incident  cases  (Table  2). 

Discussion 

Overall,  no  evidence  of  a  significant  association  between 
homozygosity  for  the  F31I  AURKA  polymorphism  and  breast 
cancer  risk  in  BRCA1  and  BRCA2  mutation  carriers  in 
combination  or  alone  was  observed.  These  results  were 
somewhat  unexpected  given  the  known  functional  relation¬ 
ship  between  AURKA  and  BRCA1  (18),  the  known  influence  of 
F31I  on  AURKA  protein  stability  (1),  and  the  significant 
associations  with  cancer  risk  reported  in  several  studies  of 
unselected  breast  cancer  cases  and  controls.  Although  the 
variant  does  not  seem  to  modify  predisposition  to  cancer  in 
this  combined  group  of  mutation  carriers,  the  possibility 
remains  that  the  He /lie  genotype  influences  tumor  progression 
or  clinical  outcome  or  modifies  cancer  risk  in  conjunction  with 
other  risk  factors.  The  suggestion  of  a  modestly  protective 
effect  of  the  Ile/Ile  genotype  in  this  study  particularly  when 
restricting  the  study  to  incident  cases  supports  this  possibility. 
Interestingly,  a  study  of  bilateral  breast  cancer  cases  also 
identified  a  nonsignificant  protective  effect  for  the  Ile/Ile 
genotype  (9).  This  common  protective  effect  among  individ¬ 
uals  at  higher  risk  of  breast  cancer  in  the  Caucasian  population 
suggests  that  homozygosity  for  the  F31I  polymorphism  may 
reduce  cancer  risk  in  high-risk  groups  while  possibly 
increasing  risk  in  the  general  population.  Additional  studies 
of  other  high-risk  populations  and  the  combined  effects  of 
other  risk  factors  are  needed  to  further  evaluate  these 
possibilities. 

In  this  study,  we  accounted  for  the  effects  of  both  bilateral 
prophylactic  oophorectomy  and  menopausal  status  effects  by 
treating  these  factors  as  time-dependent  variables  in  the 
analysis.  As  bilateral  prophylactic  oophorectomy  is  known  to 
reduce  breast  cancer  risk  by  —50%  in  BRCA1  and  BRCA2 
mutation  carriers  (19),  we  chose  to  account  for  the  remaining 
risk  of  cancer  in  women  undergoing  prophylactic  oophorec¬ 
tomy  by  assessing  it  as  an  additional  time-varying  covariate 
rather  than  by  censoring  the  follow-up  of  the  women  at  the 
time  they  underwent  this  procedure.  In  addition,  we  did  a 
sensitivity  analysis  to  assess  the  potential  for  survival  bias  in 
our  analyses  by  restricting  the  study  to  women  more  likely  to 
have  incident  cases  of  breast  cancer.  Although  no  change  in  the 
significance  of  the  results  was  observed  following  this 
approach,  it  is  important  to  evaluate  this  possibility  in  any 
study,  whether  single  site  or  multicenter,  of  individuals  at 
significantly  elevated  risk  of  cancer. 

This  report  represents  the  largest  association  study  con¬ 
ducted  to  date  in  BRCA1  and  BRCA2  carriers.  It  also  is  the  first 
report  from  CIMBA,  an  international  consortium  established  to 
provide  sufficient  statistical  power  to  test  candidate  single 
nucleotide  polymorphisms  as  modifiers  of  cancer  risk  in 
BRCA1  and  BRCA2  mutation  carriers  and  to  refine  breast 
cancer  risk  prediction  in  this  population.  The  operating 
principles  of  CIMBA  are  as  follows,  (a)  CIMBA  is  open  to 
any  group  that  can  contribute  genotype  and  phenotype 
information  on  at  least  92  BRCA1  and/or  BRCA2  mutation 
carriers.  Groups  with  smaller  collections  of  carriers  are 
encouraged  to  participate  through  partnership  with  a  larger 
group.  ( b )  Phenotypic  data  obtained  from  risk  factor  ques¬ 


tionnaires  and/or  medical  records  are  uniformly  coded  and 
stored  in  a  centralized  CIMBA  database.  These  data  include 
year  of  birth,  mutation  description,  ethnicity,  country  of 
residence,  age  at  last  follow-up,  ages  at  breast  and  ovarian 
cancer  diagnosis,  age  at  bilateral  prophylactic  mastectomy,  age 
at  bilateral  prophylactic  oophorectomy,  and  status  and  age  at 
menopause,  (c)  Panels  of  single  nucleotide  polymorphisms  for 
genotyping  are  selected  every  6  months  at  a  CIMBA  group 
meeting,  (d)  Only  single  nucleotide  polymorphisms  that  show 
significant  associations,  either  in  the  published  literature  or  in 
data  available  to  a  member  group,  at  P  <  0.01,  are  considered. 
(e)  Each  investigator /group  is  free  to  participate  or  not  in  any 
round  of  genotyping.  (/)  Genotyping  quality  control  standards 
must  be  followed  (2%  duplicates,  call  rates  >95%,  randomized 
arrangement  of  affected  and  unaffected  carriers  for  genotyp¬ 
ing).  (y)  Genotyping  data  from  participating  centers  are  pooled 
and  analyzed  as  outlined  in  the  CIMBA  analysis  plan.  This 
study  represents  the  first  genetic  modifier  study  conducted  by 
CIMBA  using  these  guidelines. 

This  study  of  7,187  BRCA1  and  BRCA2  carriers  had  80% 
power  to  detect  significant  (P  <  0.05)  protective  recessive 
effects  with  HRs  of  <0.82  for  the  F31I  allele.  We  therefore 
conclude  that  the  present  study  has  a  sufficient  sample  size  to 
assess  with  reasonable  confidence  the  involvement  of  the  F31I 
allele  in  the  modification  of  breast  cancer  risk  among  BRCA1 
and  BRCA2  mutation  carries.  It  also  shows  the  importance  of 
large  consortia,  such  as  CIMBA,  in  evaluating  the  associations 
between  genetic  markers  and  cancer  risk. 
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Abstract 

BRCA1  and  BRCA2  mutations  exhibit  variable  penetrance  that  is 
likely  to  be  accounted  for,  in  part,  by  other  genetic  factors  among 
carriers.  However,  studies  aimed  at  identifying  these  factors  have 
been  limited  in  size  and  statistical  power,  and  have  yet  to  identify 
any  convincingly  validated  modifiers  of  the  BRCA1  and  BRCA2 
phenotype.  To  generate  sufficient  statistical  power  to  identify 
modifier  genes,  the  Consortium  of  Investigators  of  Modifiers  of 
BRCA1  and  BRCA2  (CIMBA)  has  been  established.  CIMBA 
contains  about  30  affiliated  groups  who  together  have  collected 
DNA  and  clinical  data  from  approximately  10,000  BRCA1  and 
5,000  BRCA2  mutation  carriers.  Initial  efforts  by  CIMBA  to  identify 
modifiers  of  breast  cancer  risk  for  BRCA1  and  BRCA2  mutation 
carriers  have  focused  on  validation  of  common  genetic  variants 
previously  associated  with  risk  in  smaller  studies  of  carriers  or 
unselected  breast  cancers.  Future  studies  will  involve  replication  of 
findings  from  pathway-based  and  genome-wide  association 
studies  in  both  unselected  and  familial  breast  cancer.  The 
identification  of  genetic  modifiers  of  breast  cancer  risk  for  BRCA 1 
and  BRCA2  mutation  carriers  will  lead  to  an  improved 
understanding  of  breast  cancer  and  may  prove  useful  for  the 
determination  of  individualized  risk  of  cancer  amongst  carriers. 


The  search  for  genetic  modifiers  of  BRCA1 
and  BRCA2 

Female  carriers  of  deleterious  BRCA  1  and  BRCA2  mutations 
are  predisposed  to  high  lifetime  risks  of  breast  and  ovarian 
cancer.  Initial  estimates  indicated  that  around  80%  of  carriers 
of  mutations  in  BRCA1  and  BRCA2  from  multiple-case 
families  would  develop  breast  cancer  by  age  70  [1,2],  and 
genetic  counseling  is  usually  carried  out  on  the  assumption 
that  penetrance  estimates  apply  to  all  women.  However,  a 
later  pooled  analysis  from  population-based  studies 
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estimated  an  average  risk  by  age  70  in  this  context  of  66%  in 
BRCA1  carriers  and  45%  in  BRCA2  carriers  [3].  It  has  also 
been  reported  that  cancer  risks  vary  by  the  age  at  diagnosis 
and  the  type  of  cancer  in  the  index  case  [3,4].  Such 
observations  are  consistent  with  the  more  plausible 
hypothesis  that  cancer  risks  in  mutation  carriers  are  modified 
by  genetic  factors  or  other  risk  factors  that  cluster  in  families. 
Segregation  analysis  has  also  demonstrated  that  models  that 
allow  for  other  genes  to  have  a  modifying  effect  on  the  breast 
cancer  risks  conferred  by  BRCA  1  and  BRCA2  mutations  fit 
significantly  better  than  models  without  a  modifying 
component  [5],  Further  evidence  for  genetic  modifiers  arises 
from  studies  of  risk  factors  that  are  themselves  influenced  by 
genetic  factors.  For  example,  mammographic  density  that  has 
a  strong  genetic  component  [6]  has  been  recently  shown  in 
one  study  to  modify  the  breast  cancer  risks  in  BRCA1  and 
BRCA2  mutation  carriers  [7]. 

Although  there  has  been  considerable  interest  in  finding 
genetic  modifiers  of  cancer  risk  in  BRCA1  and  BRCA2 
mutation  carriers,  the  number  of  published  studies  is  still  fairly 
modest  and  has  focused  around  genes  involved  in  a  limited 
number  of  pathways:  detoxification  of  environmental  carcino¬ 
gens,  DNA  repair  and  steroidogenesis.  Several  studies  have 
evaluated  the  CAG  repeat  length  polymorphism  in  the 
androgen  receptor  (AR)  gene  as  a  modifier  of  breast  cancer 
risk  among  mutation  carriers.  However,  the  data  from  different 
studies  are  contradictory  and  no  firm  conclusions  can  be 
drawn  as  to  the  magnitude  of  such  an  effect,  if  any  [8-11], 
Many  studies  have  also  evaluated  a  repeat  length  poly¬ 
morphism  in  AIB1  as  a  modifier  of  risk  among  BRCA1  or 
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BRCA2  mutation  carriers.  Although  an  effect  of  high  numbers 
of  repeats  on  cancer  risk  in  carriers  was  first  reported  by 
Rebbeck  and  colleagues  [12],  three  large  subsequent 
studies  failed  to  replicate  this  result  [13-15].  RAD51  currently 
provides  the  most  convincing  evidence  for  the  existence  of  a 
modifier  gene,  at  least  for  BRCA2  mutation  carriers.  Levy- 
Lahad  and  colleagues  [16]  first  reported  that  the  -135G>C 
single  nucleotide  polymorphism  (SNP)  in  the  5'  untranslated 
region  of  RAD51  modified  the  breast  cancer  risk  in  BRCA2 
carriers  and  this  finding  has  been  substantiated  by  others 
[17,18].  The  function  of  the  -135G>C  SNP  in  RAD51  is  not 
clear,  but  it  could  affect  mRNA  stability  or  translational 
efficiency. 

Choosing  candidate  SNPs  or  genes  to  evaluate  as  modifiers 
of  BRCA 1  and  BRCA2  suffers  from  the  same  problem  faced 
by  all  candidate-based  genetic  association  studies,  namely 
the  poor  understanding  of  the  relevant  pathways  and  hence 
the  small  a  priori  likelihood  that  any  of  them  are  true  modifiers 
[1 9],  These  issues  may  be  overcome  in  the  future  through  the 
identification  of  candidate  genomic  regions  associated  with 
breast  cancer  risk  by  linkage  analyses  [20],  or  more  plausibly 
by  the  identification  of  candidate  SNPs  by  adequately 
powered  genome-wide  association  studies  [21],  In  addition, 
the  publication  of  convincingly  validated  SNPs  associated 
with  breast  cancer  in  the  general  population  [22]  will  provide 
some  new  candidates  to  test  as  modifiers  of  breast  cancer 
risk  among  BRCA1  or  BRCA2  mutation  carriers.  However, 
since  SNPs  associated  with  breast  cancer  in  the  general 
population  may  not  act  in  the  same  way  among  BRCA  1  and 
BRCA2  mutation  carriers,  pathway-based  and  perhaps 
genome-wide  association  studies  in  BRCA1  and  BRCA2 
carriers  are  also  needed. 

Consortium  of  Investigators  of  Modifiers  of 
BRCA1  and  BRCA2  (Cl MBA) 

A  number  of  large  studies  and  consortia  have  been 
established  that  aim  to  identify  genetic  modifiers  of  cancer 
risk  in  BRCA1  and  BRCA2  mutation  carriers,  including 
Modifiers  and  Genetics  in  Cancer  (MAGIC),  Epidemiological 
study  of  BRCA1  and  BRCA2  mutation  carriers  (EMBRACE), 
Genetic  Modifiers  of  cancer  risk  in  BRCA1/2  mutation 
carriers  (GEMO),  the  Kathleen  Cuningham  Consortium  for 
Research  into  Familial  Breast  Cancer  (kConFab),  the  German 
Consortium  for  Hereditary  Breast  and  Ovarian  Cancer 
(GCHBOC)  and  the  Breast  Cooperative  Family  Registry 
(Breast-CFR).  However,  with  current  sample  sizes  of  less 
than  1 ,500  carriers,  none  of  these  groups  have  adequate 
power  to  identify  genetic  modifiers  with  confidence.  To 
address  this  problem,  a  ‘consortium  of  consortia’,  the 
Consortium  of  Investigators  of  Modifiers  of  BRCA1  and 
BRCA2  (CIMBA),  was  established  in  2005  (see  Additional 
file  1  for  a  list  of  current  contributors).  The  operating 
principles  of  CIMBA  are:  CIMBA  is  open  to  any  group  that 
can  contribute  genotypic  and  basic  phenotypic  and 
epidemiological  risk  factor  data  from  at  least  100  female 


BRCA1  and  BRCA2  mutation  carriers  with  or  without  a 
cancer  diagnosis  -  groups  with  smaller  collections  of  carriers 
are  encouraged  to  participate  through  partnership  with  a 
larger  group;  panels  of  SNPs  for  genotyping  are  selected  at 
face-to-face  meetings  every  six  months;  only  SNPs  that  show 
significant  associations  (arbitrarily  set  at  p  <  0.01 )  with  breast 
cancer  risk  in  carriers,  either  in  the  published  literature  or  in 
data  from  a  member  group,  or  are  convincingly  identified  as 
associated  with  breast  cancer  in  the  general  population,  are 
considered;  each  group  is  free  to  participate,  or  not,  in  any 
round  of  genotyping;  genotyping  quality  control  standards 
must  be  followed  (>2°/o  duplicates,  call  rates  >95%,  no¬ 
template  controls  on  every  plate  and  randomized  arrange¬ 
ment  of  affected  and  unaffected  carriers  for  genotyping);  all 
epidemiological  risk  factor  data  and  genotyping  data  from 
carriers  are  submitted  to  the  CIMBA  data  coordinating  centre 
at  the  University  of  Cambridge;  and  genotyping  data  from 
participating  centers  are  pooled  for  analysis.  There  are 
currently  about  30  groups  from  North  America,  Europe  and 
Australia  who  plan  to  contribute  to  some  or  all  of  the 
collaborative  CIMBA  projects,  and  collectively  they  have  DNA 
and  minimum  required  clinical  and  epidemiological  data  from 
more  than  1 0,000  BRCA1  and  5,000  BRCA2  carriers. 

Statistical  considerations 

Most  association  studies  are  case-control  studies,  in  which 
genotype  frequencies  in  a  series  of  cases  are  compared  with 
those  in  series  of  controls.  The  analysis  of  BRCA1  and 
BRCA2  modifiers  is  potentially  more  complex,  because  a 
high  proportion  of  carriers  become  affected.  Thus,  modifiers 
would  be  expected  to  influence  not  just  whether  a  carrier 
became  affected  but  also  the  age  at  diagnosis.  More 
powerful  analyses  can,  therefore,  be  conducted  by  treating 
breast  cancer  as  a  survival  (age  at  onset)  rather  than  a  simple 
binary  endpoint.  An  additional  problem,  however,  is 
introduced  by  the  fact  that  mutation  carriers  are  mainly 
ascertained  through  cancer  genetics  clinics.  In  these 
settings,  the  first  tested  individual  in  a  family  is  usually 
someone  diagnosed  with  cancer  at  a  relatively  young  age. 
Such  study  designs  tend,  therefore,  to  lead  to  an  over- 
sampling  of  affected  individuals  and  standard  analytical 
methods  like  Cox  regression  may  lead  to  biased  estimates  of 
the  risk  ratios  [5].  CIMBA  aims  to  address  this  potential  bias 
by  using  standard  analytical  methods,  such  as  weighted  Cox 
regression,  or  by  analyzing  the  data  within  a  retrospective 
likelihood  framework  [5].  In  addition,  analyses  restricted  to 
incident  cases,  defined  as  carriers  diagnosed  with  cancer  no 
more  than  five  years  prior  to  ascertainment,  are  applied  to 
account  in  part  for  ascertainment  and  possible  survival  bias. 
One  of  the  aims  of  CIMBA  is  also  to  further  develop  the 
statistical  methodology  used  to  analyze  such  data.  Among 
BRCA1  mutation  carriers  and  at  a  threshold  of  p<  0.0001, 
CIMBA  currently  has  a  power  of  over  80%  to  detect 
polymorphisms  with  minor  allele  frequencies  greater  than 
10%  that  confer  risk  ratios  in  excess  of  1.2  (Table  1).  The 
power  is  somewhat  lower  among  the  current  sample  of 
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Table  1 


Simulated  power  (%)  to  detect  a  polymorphism  with  varying 
minor  allele  frequency  and  risk  ratio,  under  a  multiplicative 
model  at  a  significance  level  10'4 

Minor  allele 
frequency 

Relative 

hazard 

Sample  size: 
5000 

Sample  size: 
10,000 

0.10 

1.1 

2 

7 

1.2 

33 

80 

1.3 

86 

100 

0.20 

1.1 

5 

26 

1.2 

74 

100 

1.3 

100 

100 

0.30 

1.1 

10 

44 

1.2 

89 

100 

1.3 

100 

100 

Simulations  performed  as  in  [5]. 


BRCA2  mutation  carriers.  However,  it  is  still  far  greater  than 
the  power  that  be  achieved  by  each  study  individually  -  at  a 
minor  allele  frequency  of  20%  and  risk  ratio  of  1.2,  the 
corresponding  power  would  be  <5%  for  a  sample  size  of 
approximately  1 ,000  carriers.  Moreover,  most  of  the 
participating  CIMBA  centers  are  actively  recruiting  carriers, 
and  larger  sample  sizes  are  expected  in  the  future. 

Conclusions 

The  identification  of  convincingly  validated  modifiers  of  breast 
cancer  risk  for  BRCA1  and  BRCA2  mutation  carriers  will 
help  to  understand  the  biology  of  hereditary  breast  tumors 
and,  in  the  case  of  BRCA  7-mutation-associated  risk 
modifiers,  will  also  provide  candidate  low  penetrance  genes 
for  ‘sporadic’  basal  cell  breast  cancers  because  of  their 
similarity  to  BRCA  1  -related  breast  tumors  [23,24],  In  the 
long  term  it  might  be  possible  to  include  information  on 
genetic  modifiers  in  risk  prediction  models,  to  give 
individualized  advice  to  mutation  carriers  on  individual  breast 
cancer  risks,  and  to  have  sufficient  power  to  evaluate  the  risk 
of  other  cancers  in  BRCA1  and  BRCA2  mutation  carriers. 
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Breast  cancer  exhibits  familial  aggregation,  consistent  with  variation  in  genetic  susceptibility  to  the  disease.  Known 
susceptibility  genes  account  for  less  than  25%  of  the  familial  risk  of  breast  cancer,  and  the  residual  genetic  variance  is  likely 
to  be  due  to  variants  conferring  more  moderate  risks.  To  identify  further  susceptibility  alleles,  we  conducted  a  two-stage 
genome-wide  association  study  in  4,398  breast  cancer  cases  and  4,316  controls,  followed  by  a  third  stage  in  which  30  single 
nucleotide  polymorphisms  (SNPs)  were  tested  for  confirmation  in  21,860  cases  and  22,578  controls  from  22  studies.  We 
used  227,876  SNPs  that  were  estimated  to  correlate  with  77%  of  known  common  SNPs  in  Europeans  at  r2  >  0.5.  SNPs  in  five 
novel  independent  loci  exhibited  strong  and  consistent  evidence  of  association  with  breast  cancer  (P  <  10  7).  Four  of  these 
contain  plausible  causative  genes  ( FGFR2 ,  TNRC9,  MAP3K1  and  LSP1).  At  the  second  stage,  1,792  SNPs  were  significant  at  the 
P  <  0.05  level  compared  with  an  estimated  1,343  that  would  be  expected  by  chance,  indicating  that  many  additional  common 
susceptibility  alleles  may  be  identifiable  by  this  approach. 


Breast  cancer  is  about  twice  as  common  in  the  first-degree  relatives  of 
women  with  the  disease  as  in  the  general  population,  consistent  with 
variation  in  genetic  susceptibility  to  the  disease1.  In  the  1990s,  two 
major  susceptibility  genes  for  breast  cancer,  BRCA1  and  BRCA2,  were 
identified2,3.  Inherited  mutations  in  these  genes  lead  to  a  high  risk  of 
breast  and  other  cancers4.  However,  the  majority  of  multiple  case 
breast  cancer  families  do  not  segregate  mutations  in  these  genes. 
Subsequent  genetic  linkage  studies  have  failed  to  identify  further 
major  breast  cancer  genes5.  These  observations  have  led  to  the  pro¬ 
posal  that  breast  cancer  susceptibility  is  largely  ‘polygenic’:  that  is, 
susceptibility  is  conferred  by  a  large  number  of  loci,  each  with  a  small 
effect  on  breast  cancer  risk6.  This  model  is  consistent  with  the  ob¬ 
served  patterns  of  familial  aggregation  of  breast  cancer7.  However, 
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progress  in  identifying  the  relevant  loci  has  been  slow.  As  linkage 
studies  lack  power  to  detect  alleles  with  moderate  effects  on  risk,  large 
case-control  association  studies  are  required.  Such  studies  have  iden¬ 
tified  variants  in  the  DNA  repair  genes  CHEK2,  ATM,  BRIP1  and 
PALB2  that  confer  an  approximately  twofold  risk  of  breast  cancer, 
but  these  variants  are  rare  in  the  population8-14.  A  recent  study  has 
shown  that  a  common  coding  variant  in  CASP8  is  associated  with  a 
moderate  reduction  in  breast  cancer  risk15.  After  accounting  for  all 
the  known  breast  cancer  loci,  more  than  75%  of  the  familial  risk  of 
the  disease  remains  unexplained16. 

Recent  technological  advances  have  provided  platforms  that  allow 
hundreds  of  thousands  of  SNPs  to  be  analysed  in  association  studies, 
thus  providing  a  basis  for  identifying  moderate  risk  alleles  without 
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prior  knowledge  of  position  or  function.  It  has  been  estimated  that 
there  are  7  million  common  SNPs  in  the  human  genome  (with  minor 
allele  frequency,  m.a.f.,  >5%)17.  However,  because  recombination 
tends  to  occur  at  distinct  ‘hot-spots’,  neighbouring  polymorphisms 
are  often  strongly  correlated  (in  ‘linkage  disequilibrium’,  LD)  with 
each  other.  The  majority  of  common  genetic  variants  can  therefore  be 
evaluated  for  association  using  a  few  hundred  thousand  SNPs  as  tags 
for  all  the  other  variants18.  We  aimed  to  identify  further  breast  cancer 
susceptibility  loci  in  a  three-stage  association  study19.  In  the  first 
stage,  we  used  a  panel  of  266,722  SNPs,  selected  to  tag  known  com¬ 
mon  variants  across  the  entire  genome18.  These  SNPs  were  genotyped 
in  408  breast  cancer  cases  and  400  controls  from  the  UK;  data  were 
analysed  for  390  cases  and  364  controls  genotyped  for  &80%  of 
the  SNPs.  The  cases  were  selected  to  have  a  strong  family  history  of 
breast  cancer,  equivalent  to  at  least  two  affected  female  first-degree 
relatives,  because  such  cases  are  more  likely  to  carry  susceptibility 
alleles20.  Initally,  we  analysed  227,876  SNPs  (85%)  with  genotypes  on 
at  least  80%  of  the  subjects.  We  estimate  that  these  SNPs  are  corre¬ 
lated  with  58%  of  common  SNPs  in  the  HapMap  CEPH/CEU  (Utah 
residents  with  ancestry  from  northern  and  western  Europe)  samples 
at  r2  >  0.8,  and  77%  at  r2  >  0.5  (mean  r2  =  0.75;  see  Supplementary 
Fig.  1)  (http://www.hapmap.org/)21.  As  expected,  coverage  was 
strongly  related  to  m.a.f.:  70%  of  SNPs  with  m.a.f.  >  10%  were  tagged 
at  r2  >  0.8,  compared  with  23%  of  SNPs  with  m.a.f.  5-10%.  The  main 
analyses  were  restricted  to  205,586  SNPs  that  had  a  call  rate  of  90% 
and  whose  genotype  distributions  did  not  differ  from  Hardy- 
Weinberg  equilibrium  in  controls  (at  P <  10  5). 

For  the  second  stage  we  selected  12,71 1  SNPs,  approximately  5%  of 
those  typed  in  stage  1 ,  on  the  basis  of  the  significance  of  the  difference 
in  genotype  frequency  between  cases  and  controls.  These  SNPs  were 


Expected  /'l 

Figure  1  |  Quantile-quantile  plots  for  the  test  statistics  (Cochran- 
Armitage  1  d.f.  /2  trend  tests)  for  stages  1  and  2.  a.  Stage  1;  b,  stage  2.  Black 
dots  are  the  uncorrected  test  statistics.  Red  dots  are  the  statistics  corrected  by 
the  genomic  control  method  ( /.  =  1.03  for  stage  1,2=  1.06  for  stage  2). 
Under  the  null  hypothesis  of  no  association  at  any  locus,  the  points  would  be 
expected  to  follow  the  black  line. 


then  genotyped  in  a  further  3,990  invasive  breast  cancer  cases  and 
3,916  controls  from  the  SEARCH  study,  using  a  custom-designed 
oligonucleotide  array.  In  the  main  analyses,  we  considered  10,405 
SNPs  with  call  rate  of  >95%  that  did  not  deviate  from  Hardy- 
Weinberg  equilibrium  in  controls. 

Comparison  of  the  observed  and  expected  distribution  of  test  stat¬ 
istics  showed  some  evidence  for  an  inflation  of  the  test  statistics  in  both 
stage  1  (inflation  factor  1  =  1.03,  95%  confidence  interval  (Cl)  1.02- 
1.04)  and  stage  2  (2  =  1.06, 95%  Cl  1.04-1.12),  based  on  the  90%  least 
significant  SNPs  (Fig.  1).  Possible  explanations  for  this  inflation 
include  population  stratification,  cryptic  relatedness  among  subjects, 
and  differential  genotype  calling  between  cases  and  controls.  There 
was  evidence  for  an  excess  of  low  call  rate  SNPs  among  the  most 
significant  SNPs  (P<  0.01)  in  stage  1,  but  not  in  stage  2,  suggesting 
that  some  of  this  effect  is  a  genotyping  artefact  (Supplementary  Table 
1).  However,  the  inflation  was  still  present  among  SNPs  with  call  rate 
>99%  in  both  cases  and  controls,  possibly  reflecting  population  sub¬ 
structure.  We  computed  1  degree  of  freedom  (d.f.)  association  tests  for 
each  SNP,  combining  stages  1  and  2.  After  adjustment  for  this  inflation 
by  the  genomic  control  method22,  we  observed  more  associations  than 
would  have  been  expected  by  chance  at  P<  0.05  (Table  1).  One  SNP 
(dbSNP  rs2981582)  was  significant  at  the  P  <  10  7  level  that  has  been 
proposed  as  appropriate  for  genome-wide  studies23. 

In  the  third  stage,  to  establish  whether  any  SNPs  were  definitely 
associated  with  risk,  we  tested  30  of  the  most  significant  SNPs  in  22 
additional  case-control  studies,  comprising  21,860  cases  of  invasive 
breast  cancer,  988  cases  of  carcinoma  in  situ  (CIS)  and  22,578  controls 
(Supplementary  Table  2).  Six  SNPs  showed  associations  in  stage  3  that 
were  significant  at  10  5  with  effects  in  the  same  direction  as  in 
stages  1  and  2  (Table  2,  Supplementary  Table  3,  and  Fig.  2).  All  these 
SNPs  reached  a  combined  significance  level  of  P  <  1 0  7  ( ranging  from 
2  X  10-76  to  3  X  10~9).  Of  these  six  SNPs,  five  were  within  genes  or 
LD  blocks  containing  genes.  SNP  rs2981582  lies  in  intron  2  of  FGFR2 
(also  known  as  CEK3),  which  encodes  the  fibroblast  growth  factor 
receptor  2.  SNPs  rsl2443621  and  rs8051542  are  both  located  in  an 
LD  block  containing  the  5'  end  of  TNRC9  (also  known  as  TOX3),  a 
gene  of  uncertain  function  containing  a  tri-nucleotide  repeat  motif,  as 
well  as  the  hypothetical  gene,  LOC643714.  SNP  rs889312  lies  in  an  LD 
block  of  approximately  280  kb  that  contains  MAP3K1  (also  known  as 
MEKK),  which  encodes  the  signalling  protein  mitogen- activated  pro¬ 
tein  kinase  kinase  kinase  1,  in  addition  to  two  other  genes:  MGC33648 
and  MIER3.  SNP  rs3817198  lies  in  intron  10  of  LSP1  (also  known  as 
WP43),  encoding  lymphocyte-specific  protein  1,  an  F-actin  bundling 
cytoskeletal  protein  expressed  in  haematopoietic  and  endothelial  cells. 
A  further  SNP,  rs2 107425,  located  just  llOkilobases  (kb)  from 
rs3817198,  was  also  identified  (overall  P=  0.00002).  rs2107425  is 
within  the  FI  19  gene,  an  imprinted  maternally  expressed  untranslated 
messenger  RNA  closely  involved  in  regulation  of  the  insulin  growth 
factor  gene,  IGF2.  In  stage  3,  however,  rs2 107425  was  only  weakly 
significant  after  adjustment  for  rs3817198  by  logistic  regression 
(P=  0.06).  This  suggests  that  the  association  with  breast  cancer  risk 
may  be  driven  by  variants  in  LSP1  rather  than  in  H 19.  The  sixth  SNP 
reaching  a  combined  P<  10  7  was  rsl3281615,  which  lies  on  8q.  It  is 
correlated  with  SNPs  in  a  110  kb  LD  block  that  contains  no  known 


Table  1  |  Number  of  significant  associations  after  stage  2 


Level  of  significance 

Observed 

Observed 

adjusted* 

Expected 

Ratio 

0.01-0.05 

1,239 

1,162 

934.3 

1.24 

0.001-0.01 

574 

517 

347.6 

1.49 

0.0001-0.001 

112 

88 

53.3 

1.65 

0.00001-0.0001 

16 

12 

7.0 

1.71 

<0.00001 

15 

13 

0.96 

13.5 

All  P<  0.05 

1,956 

1,792 

1,343.2 

1.33 

Observed  numbers  of  SNPs  associated  with  breast  cancer  after  stage  2,  by  level  of  significance, 
before  and  after  adjustment  for  population  stratification,  and  expected  numbers  under  the  null 
hypothesis  of  no  association. 

*  Adjusted  for  inflation  of  the  test  statistic  by  the  genomic  control  method. 
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Table  2  |  Summary  of  results  for  eleven  SNPs  selected  for  stage  3  that  showed  evidence  of  an  association  with  breast  cancer 


rs  Number 

Gene 

Position* 

m.a.f.f 

Per  allele  OR 
(95%  Cl) 

HetOR 
(95%  Cl) 

HomOR 
(95%  Cl) 

P-trend 

Stages 

1  and  2 

Stage3 

Combined 

rs2981582 

FGFR2 

10q 

0.38 

1.26 

1.23 

1.63 

4  X  10-16 

5  X  10-62 

2  X  10 -76 

123342307 

(0.30) 

(1.23-1.30) 

(1.18-1.28) 

(1.53-1.72) 

rsl2443621 

TNRC9/ 

16q 

0.46 

1.11 

1.14 

1.23 

10-7 

9  X  10-14 

2  X  10-19 

LOC643714 

51105538 

(0.60) 

(1.08-1.14) 

(1.09-1.20) 

(1.17-1.30) 

rs8051542 

TNRC9/ 

16q 

0.44 

1.09 

1.10 

1.19 

4  X  10-6 

4  X  10-s 

O 

LOC643714 

51091668 

(0.20) 

(1.06-1.13) 

(1.05-1.16) 

(1.12-1.27) 

rs889312 

MAP3K1 

5q 

0.28 

1.13 

1.13 

1.27 

4  X  10-6 

3  X  10-15 

7  X  10-2° 

56067641 

(0.54) 

(1.10-1.16) 

(1.09-1.18) 

(1.19-1.36) 

rs3817198 

LSP1 

Up 

0.30 

1.07 

1.06 

1.17 

8  X  10-6 

10-5 

3  X  10-9 

1865582 

(0.14) 

(1.04-1.11) 

(1.02-1.11) 

(1.08-1.25) 

rs2107425 

H19 

Up 

0.31 

0.96 

0.94 

0.95 

7  X  10-6 

0.01 

2  X  10-5 

1977651 

(0.44) 

(0.93-0.99) 

(0.90-0.98) 

(0.89-1.01) 

rsl3281615 

8q 

0.40 

1.08 

1.06 

1.18 

2  X  10-7 

6  X  10-7 

5  X  10-12 

128424800 

(0.56) 

(1.05-1.11) 

(1.01-1.11) 

(1.10-1.25) 

rs981782 

5p 

0.47 

0.96 

0.96 

0.92 

8  X  10-5 

0.003 

9  X  10-6 

45321475 

(0.37) 

(0.93-0.99) 

(0.92-1.01) 

(0.87-0.97) 

rs30099 

5q 

0.08 

1.05 

1.06 

1.09 

0.003 

0.02 

0.001 

52454339 

(0.39) 

(1.01-1.10) 

(1.00-1.11) 

(0.96-1.24) 

rs4666451 

2p 

0.41 

0.97 

0.98 

0.93 

5  X  10-6 

0.04 

6  X  10-5 

19150424 

(0.04) 

(0.94-1.00) 

(0.93-1.02) 

(0.87-0.99) 

rs3803662J 

TNRC9/ 

16q 

0.25 

1.20 

1.23 

1.39 

3  X  10 -12 

10-26 

Iq-36 

LOC643714 

51143842 

(0.60) 

(1.16-1.24) 

(1.18-1.29) 

(1.26-1.45) 

OR,  odds  ratio;  HetOR,  odds  ratio  in  heterozygotes;  HomOR,  odds  ratio  in  rare  homozygotes  (relative  to  common  homozygotes);  Cl,  confidence  interval. 

*  Build  36.2  position. 

t  Minor  allele  frequency  in  SEARCH  (UK)  study.  Combined  allele  frequency  from  three  Asian  studies  in  italics. 

$  rs3803662  was  not  part  of  the  initial  tag  SNP  set  but  identified  as  a  result  of  fine-scale  mapping  of  the  TNRC9/LOC643714  locus  and  typed  in  the  stage  2  and  stage  3  sets  (but  not  the  stage  1  set). 


genes.  The  basis  of  this  association  therefore  remains  obscure.  This 
SNP  is  approximately  130 kb  proximal  to  rsl447295,  60 kb  proximal 
to  rs6983267  and  230  kb  distal  to  rsl6901979,  recently  shown  to  be 
associated  with  prostate  cancer24-26. 

In  addition  to  the  seven  SNPs  described  above,  there  was  evidence 
of  association  among  the  remaining  23  SNPs  (global  P  =  0.001  in 
stage  3).  In  particular,  three  SNPs  showed  some  evidence  of  asso¬ 
ciation  in  stage  3  {P<  0.05,  in  each  case  in  the  same  direction  as  in 
stages  1  and  2;  Table  2).  SNPs  rs981782  and  rs30099  both  lie  in  the 
centromeric  region  of  chromosome  5.  rs4666451  lies  on  2p,  a  region 
for  which  some  evidence  of  linkage  to  breast  cancer  in  families  has 
been  reported5.  The  20  other  SNPs  showed  no  evidence  of  association 
in  stage  3  (global  P  =  0. 1 1 ),  suggesting  that  most  of  these  associations 
from  stages  1  and  2  were  false  positives. 


FGFR2 

The  most  significantly  associated  SNP,  rs298 1 582,  lies  within  a  25  kb  LD 
block  almost  entirely  within  intron  2  of  FGFR2.  We  found  no  evidence 
of  association  with  SNPs  elsewhere  in  the  gene  (Fig.  3a).  In  an  attempt  to 
identify  a  causal  variant,  we  first  identified  the  19  common  variants 
(m.a.f.  >  0.05)  in  this  block  from  HapMap  CEU  data.  These  were  tagged 
( 7  >  0.8)  by  7  SNPs  including  rs298 1582.  The  additional  tag  SNP  s  were 
genotyped  in  the  SEARCH  study  cases  and  controls.  Multiple  logistic 
regression  analysis  of  these  variants  found  no  additional  evidence  for 
association  after  adjusting  for  rs2981582.  Haplotype  analysis  of  these  7 
SNPs  indicated  that  multiple  haplotypes  carrying  the  minor  ( a )  allele  of 
rs298 1582  were  associated  with  an  increased  risk  of  breast  cancer,  imply¬ 
ing  that  the  association  was  being  driven  by  rs298 1 582  itself  or  a  variant 
strongly  correlated  with  it  (Supplementary  Table  4). 


T 


Stage  1 
Stage  2 
ABCFS 
KConFab/AOC 
MCCS 
SASBCS 
CNIOBCS 
CGPS 
GENICA 
HBCS 
HBCP 
KBCP 
LUMCBCS 
RBCS 
NCIPBCS 
SE^RCH3 
SBCS 
MCBCS 
NHS 
USRTS 
MEC-W 
European 

MEC-J 

TBCS 

SBCP 

Asian 

TOTAL 


0.8  1.0  1.2  1.4  1.6  1.8  0.8  1.0  1.2  1.4  1.6  1.8  0.8  1.0  1.2  1.4  1.6  1.8 


0.8  1.0  1.2  1.4 


* 


i 


Figure  2  |  Forest  plots  of  the  per-allele  odds  ratios  for  each  of  the  five  SNPs 
reaching  genome-wide  significance,  a,  rs2981582;  b,  rs3803662;  c,  rs889312; 
d,  rsl3281615;  and  e,  rs3817198.  The  x-axis  gives  the  per-allele  odds  ratio. 
Each  row  represents  one  study  (see  Supplementary  Table  2),  with  summary 
odds  ratios  for  all  European  and  all  Asian  studies,  and  all  studies  combined. 


The  area  of  the  square  for  each  study  is  proportional  to  the  inverse  of  the 
variance  of  the  estimate.  Horizontal  lines  represent  95%  confidence 
intervals.  Diamonds  represent  the  summary  odds  ratios,  with  95% 
confidence  intervals,  based  on  the  stage  3  studies  only. 
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Resequencing  of  this  region  in  45  subjects  of  European  origin 
identified  29  variants  that  were  strongly  correlated  with  rs2981582 
(r2>  0.6)  (http://cgwb.nci.nih.gov;  Fig.  3b  and  Supplementary 
Tables  5-8).  A  subset  of  14  variants  tagged  27  of  these  in  European 
(r2>  0.95)  and  Asian  (Korean)  samples  (r2>0.86).  Two  variants 
could  not  be  genotyped  reliably.  This  new  tagging  set  was  then  gen- 
otyped  in  SEARCH  and  3  studies  from  Asian  populations;  the  Asian 
studies  were  included  because  the  LD  is  weaker,  providing  greater 
power  to  resolve  the  causal  variant  (Fig.  3b,  left  panel).  The  strongest 
association  was  found  with  rs7895676.  On  the  assumption  that  there 
is  a  single  disease-causing  allele,  we  calculated  a  likelihood  for  each 
variant.  21  SNPs  (including  rs2981582)  had  a  likelihood  ratio  of  <1/ 
100  relative  to  rs7895676,  indicating  that  none  of  these  are  likely  to  be 
the  causal  variant  (Supplementary  Table  8).  Six  variants  were  too 
strongly  correlated  for  their  individual  effects  to  be  separated  using 
a  genetic  epidemiological  approach.  Functional  assays  will  be 
required  to  determine  which  is  causally  related  to  breast  cancer  risk. 

Intron  2  of  FGFR2  shows  a  high  degree  of  conservation  in  mam¬ 
mals,  and  contains  several  putative  transcription-factor  binding  sites 
(http://genomequebec.mcgill.ca/PReMod)27,  some  of  which  lie  in 
close  proximity  to  the  relevant  SNPs.  We  therefore  speculate  that 
the  association  with  breast  cancer  risk  is  mediated  through  regulation 
of  FGFR2  expression.  Of  possible  relevance  is  that  only  three  of  these 
variants  (rsl0736303,  rs2981578  and  rs35054928)  are  within 
sequences  conserved  across  all  placental  mammals  (Fig.  3c  and 


Supplementary  Table  8).  Of  these,  the  disease  associated  allele  of 
rsl0736303  generates  a  putative  oestrogen  receptor  (ER)  binding  site. 
rs35054928  lies  immediately  adjacent  to  a  perfect  POU  domain  pro¬ 
tein  octamer  (Oct)  binding  site.  However,  multiple  splice  variants 
have  been  reported  in  FGFR2 ,  and  differential  splicing  might  provide 
an  alternative  mechanism  for  the  association.  FGFR2  is  a  receptor 
tyrosine  kinase  that  is  amplified  and  overexpressed  in  5-10%  of 
breast  tumours28-30.  Somatic  missense  mutations  of  FGFR2  that  are 
likely  to  be  implicated  in  cancer  development  have  also  been  demon¬ 
strated  in  primary  tumours  and  cell  lines  of  multiple  tumour  types 
(http://www.sanger.ac.uk/genetics/CGP/cosmic/)30,31. 

TNRC9/LOC643714  locus 

As  two  SNPs  in  the  TNRC9/LOC643714  locus,  rsl2443621  and 
rs805 1 542 ,  both  showed  convincing  evidence  of  association,  we  further 
evaluated  this  region  by  genotyping,  in  the  SEARCH  set,  an  additional 
19  SNPs  tagging  101  common  variants  within  the  entire  TNRC9  and 
LOC643714  genes,  based  on  the  HapMap  CEU  data.  SNPs  tagging  the 
coding  region  of  TNRC9  showed  no  evidence  of  association.  The  stron¬ 
gest  association  was  observed  with  rs3803662,  a  synonymous  coding 
SNP  of  LOC643714  that  lies  8  kb  upstream  of  TNRC9.  This  SNP  was 
therefore  genotyped  in  the  stage  3  set  (Table  2).  Logistic  regression 
analysis  indicated  that  rs3803662  exhibited  a  stronger  association  with 
disease  than  other  SNPs,  and  the  associations  with  other  SNPs  were 
non-significant  after  adjustment  for  rs3803662.  These  results  suggest 


Exon  2  5'  UTR 


Figure  3  |  The  FGFR2  locus,  a,  Map  of  the  whole 
FGFR2  gene,  viewed  relative  to  common  SNPs  on 
HapMap.  The  gene  is  126  kb  long  and  in  reverse 
3'— 5'  orientation  on  chromosome  10.  Exon 
positions  are  illustrated  with  respect  to  the  67 
SNPs  with  m.a.f.  >  5%  in  HapMap  CEU 
(therefore  the  map  is  not  to  physical  scale). 
Numbered  SNPs  are  those  tested  in  the  genome¬ 
wide  study.  SNPs  in  black  were  not  significant  in 
stage  1.  Those  in  red  were  significant  at 
P  <  0.0001  after  stage  2.  rsl05 10097  (orange)  was 
significant  in  stage  1,  but  failed  quality  control  in 
stage  2  owing  to  deviation  from  Hardy-Weinberg 
equilibrium.  Squares  indicate  pairwise  r2  on  a 
greyscale  (black  =  1,  white  =  0).  Red  circle 
indicates  rs2981582.  b,  Resequenced  32  kb 
region,  shown  relative  to  SNPs  in  CEU  with 
m.a.f.  >  5%,  showing  pairwise  LD  for  SNPs  in 
HapMap  CEU  (left  panel)  and  JPT/CHB  (right 
panel).  Red  circle  indicates  rs2981582,  shown  in 
bold  black,  c,  Sequence  conservation  of  32  kb 
region  in  five  species,  relative  to  human  sequence 
(http://pipeline.lbl.gov/methods.shtml)35.  Red 
circle  indicates  rs2981582.  SNPs  in  grey  are  those 
used  in  the  initial  tagging  of  known  common 
HapMap  SNPs  within  the  block.  SNPs  in  black 
are  correlated  with  rs2981582  with  r2  >  0.6  in 
European  samples.  Six  SNPs  in  red  were  those 
consistent  with  being  the  causative  variant  on  the 
basis  of  the  genetic  data  (not  excluded  at  odds  of 
100:1  relative  to  the  SNP  with  the  strongest 
association,  rs7895676). 


1090 


©2007  Nature  Publishing  Group 


NATURE!  Vol  447128  June  2007 


ARTICLES 


that  the  causal  variant  is  closely  correlated  with  rs3803662.  Four  SNPs 
in  the  HapMap  CEU  data  (rsl7271951,  rsl362548,  rs3095604  and 
rs4784227)  that  span  LOC643714  and  the  5'  regulatory  regions  of 
TNRC9  are  strongly  correlated  with  rs3803662,  and  it  therefore 
remains  unclear  in  which  gene  the  causative  variant  lies.  TNRC9  con¬ 
tains  a  putative  HMG  (high  mobility  group)  box  motif,  suggesting  that 
it  might  act  as  a  transcription  factor. 

Pattern  of  risks 

We  assessed  in  more  detail,  in  the  stage  3  data,  the  pattern  of  the 
risks  associated  with  the  five  independent  SNPs  that  reached  an  over¬ 
all  P CIO-7:  rs2981582  ( FGFR2 ),  rs3803662  ( TNRC9/LOC643714 ), 
rs889312  ( MAP3K1 ),  rsl3281615  (8q)  and  rs3817198  (LSP1).  For  each 
of  these  five  SNPs,  the  minor  allele  in  Europeans  was  associated  with  an 
increased  risk  of  breast  cancer  in  a  dose-dependent  manner,  with  a 
higher  risk  of  breast  cancer  in  homozygous  than  in  heterozygous  car¬ 
riers.  Simple  dominant  and  recessive  models  could  be  rejected  for  each 
SNP  (all  P=  0.02  or  less).  There  was  a  marked  difference  in  allele 
frequencies  between  populations,  with  the  risk-associated  alleles  of 
rs8051542,  rs889312  and  rsl3281615  being  the  major  allele  in  Asian 
populations.  The  per  allele  odds  ratio  associated  with  rs2981582  was 
significantly  smaller,  though  still  elevated,  in  the  Asian  versus  European 
populations  (P=  0.04  for  difference  in  odds  ratio).  This  difference  is 
consistent  with  the  hypothesis  that  rs2981582  is  not  the  functional 
variant  at  the  FGFR2  locus,  and  was  not  seen  for  SNPs  exhibiting  stron¬ 
ger  evidence  in  the  fine-scale  mapping.  No  other  evidence  for  hetero¬ 
geneity  in  the  per-allele  odds  ratio  among  studies  was  observed  (Fig.  2) . 

Three  of  the  SNPs  (rs2981582,  rs3803662  and  rs889312)  also 
showed  evidence  of  association  with  breast  CIS  (Supplementary 
Table  9).  For  rs298 1582  andrs3803662,  the  estimated  odds  ratios  were 
greater  for  a  diagnosis  of  breast  cancer  before  age  40  years,  but  the 
trends  by  age  were  not  statistically  significant  (Supplementary  Table 
10).  There  was  evidence  of  an  association  with  family  history  of  breast 
cancer  for  three  SNPs:  for  rs2981582  (P  =  0.02),  rs3803662  (P  =  0.03) 
and  rsl3281615  (P  =  0.05),  the  susceptibility  allele  was  commoner  in 
women  with  a  first-degree  relative  with  the  disease  than  in  those 
without  (Supplementary  Table  11).  rs2981582  was  also  associated 
with  bilaterality  (P  =  0.02).  The  associations  with  family  history  and 
bilaterality  are  to  be  expected  for  susceptibility  loci,  and  are  similar  to 
previous  observations  for  alleles  in  CHEK2  and  ATM  (refs  10, 12, 14). 

Discussion 

This  study  has  identified  five  novel  breast  cancer  susceptibility  loci, 
and  demonstrated  conclusively  that  some  of  the  variation  in  breast 
cancer  risk  is  due  to  common  alleles.  None  of  the  loci  we  identified 
had  been  previously  reported  in  association  studies.  Most  previously 
identified  breast  cancer  susceptibility  genes  are  involved  in  DNA 
repair,  and  many  association  studies  in  breast  cancer  have  concen¬ 
trated  on  genes  in  DNA  repair  and  sex  hormone  synthesis  and  meta¬ 
bolism  pathways.  None  of  the  associations  reported  here  appear  to 
relate  to  genes  in  these  pathways.  It  is  notable  that  three  of  the  five  loci 
contain  genes  related  to  control  of  cell  growth  or  to  cell  signalling,  but 
only  one  ( FGFR2 )  had  a  clear  prior  relevance  to  breast  cancer.  These 
results  should,  therefore,  open  up  new  avenues  for  basic  research. 

Our  results  emphasize  the  critical  importance  of  study  size  in  gen¬ 
etic  association  studies.  It  is  notable  that  none  of  the  confirmed  asso¬ 
ciations  reached  genome-wide  significance  after  stage  1  and  only  one 
reached  this  level  after  stage  2.  As  most  common  cancers  have  similar 
familial  relative  risks  to  breast  cancer,  it  is  likely  that  similarly  large 
studies  will  be  required  to  identify  common  alleles  for  other  cancers. 
The  fine-scale  mapping  of  the  FGFR2  locus  demonstrates  that,  even 
with  a  clear  association,  identification  of  the  causative  variant  can  be 
extremely  problematic.  Elowever,  the  use  of  studies  from  multiple 
populations  with  different  patterns  of  LD  can  substantially  reduce 
the  number  ofvariants  that  need  to  be  subjected  to  functional  analysis. 

As  these  susceptibility  alleles  are  very  common,  a  high  proportion  of 
the  general  population  are  carriers  of  at-risk  genotypes.  For  example. 


approximately  14%  of  the  UK  population  and  19%  of  UK  breast 
cancer  cases  are  homozygous  for  the  rare  allele  at  rs2981582.  On  the 
other  hand,  the  increased  risks  associated  with  these  alleles  are  rela¬ 
tively  small — on  the  basis  of  UK  population  rates,  the  estimated  breast 
cancer  risk  by  age  70  years  for  rare  homozygotes  at  rs2981582  is  10.5%, 
compared  to  6.7%  in  heterozygotes  and  5.5%  in  common  homozy¬ 
gotes.  At  this  stage,  it  is  unlikely  that  these  SNPs  will  be  appropriate  for 
predictive  genetic  testing,  either  alone  or  in  combination  with  each 
other.  However,  as  further  susceptibility  alleles  are  identified,  a  com¬ 
bination  of  such  alleles  together  with  other  breast  cancer  risk  factors 
may  become  sufficiently  predictive  to  be  important  clinically. 

On  the  basis  of  the  relative  risk  estimates  from  stage  3,  and  assuming 
that  the  five  most  significant  loci  interact  multiplicatively  on  disease 
risk,  these  loci  explain  an  estimated  3.6%  of  the  excess  familial  risk  of 
breast  cancer.  On  the  basis  of  our  staged  design  and  the  estimated 
distribution  of  linkage  disequilibrium  between  the  typed  SNPs  and 
those  in  HapMap,  we  estimate  that  the  power  to  identify  the  five  most 
significant  associations  at  P<  10  7  (rs2981582,  rs3803662,  rs889312, 
rsl3281615  and  rs3817198)  was  93%,  71%,  25%,  3%  and  1%  respect¬ 
ively.  These  estimates  are  uncertain,  notably  because  the  true  coverage 
of  HapMap  SNPs  is  unknown.  Nevertheless,  these  calculations  indicate 
that  the  power  to  detect  the  two  strongest  associations  was  high,  and 
suggest  that  there  are  likely  to  be  few  other  common  variants  with  a 
similar  effect  on  variation  in  breast  cancer  risk  to  rs2981582.  In  con¬ 
trast,  the  low  power  to  detect  rsl3281615  and  rs3817198  suggests  that 
these  variants  may  represent  a  much  larger  class  of  loci,  each  explaining 
of  the  order  of  0.1%  of  the  familial  risk  of  breast  cancer.  An  example  of 
such  a  locus  is  provided  by  CASP8  D302H,  which  showed  strong 
evidence  of  association  in  a  previous  large  study15.  This  SNP  was  tested 
in  stage  1,  but  the  association  was  missed  because  it  did  not  reach  the 
threshold  for  testing  in  stage  2.  The  excess  of  associations  after  stage  2  is 
also  consistent  with  the  existence  of  many  such  loci.  In  addition, 
because  the  coverage  for  SNPs  with  m.a.f.  <  10%  was  low,  many  low 
frequency  alleles  may  have  been  missed.  The  detection  of  further  sus¬ 
ceptibility  loci  will  require  genome- wide  studies  with  more  complete 
coverage  and  using  larger  numbers  of  cases  and  controls,  together  with 
the  combination  of  results  across  multiple  studies.  The  present  study 
demonstrates  that  common  susceptibility  loci  can  be  reliably  iden¬ 
tified,  and  that  they  may  together  explain  an  appreciable  fraction  of 
the  genetic  variance  in  breast  cancer  risk. 

METHODS  SUMMARY 

Cases  for  stage  1  were  identified  through  clinical  genetics  centres  in  the  UK  and  a 
national  study  of  bilateral  breast  cancer.  Cases  in  stage  2  were  drawn  from  a 
population-based  study  of  breast  cancer  (SEARCH)32.  Controls  for  stages  2  and  3 
were  drawn  from  EPIC-Norfolk,  a  population-based  study  of  diet  and  cancer33. 

Cases  and  controls  for  stage  3  were  identified  through  case-control  studies  in 
Europe,  North  America,  South-East  Asia  and  Australia  participating  in  the 
Breast  Cancer  Association  Consortium  (Supplementary  Table  2)34. 

Genotyping  for  stages  1  and  2  was  conducted  using  high-density  oligonucleo¬ 
tide  microarrays.  For  the  main  analyses,  we  excluded  samples  called  on  £80%  of 
SNPs  in  either  stage.  We  also  excluded  SNPs  that  achieved  a  call  rate  of  £90%  in 
stage  1  and  £95%  in  stage  2,  and  SNPs  whose  frequency  deviated  from  Hardy- 
Weinberg  equilibrium  in  controls  at  P  <  0.00001.  Genotyping  for  stage  3,  and  for 
the  fine-scale  mapping  of  the  FGFR2  locus,  was  conducted  using  either  a  5’ 
nuclease  assay  (Taqman,  Applied  Biosystems)  or  MALDI-TOF  mass  spectro¬ 
metry  using  the  Sequenom  iPLEX  system.  For  each  centre,  we  excluded  any 
sample  called  on  £80%  of  SNPs,  and  any  SNP  with  a  call  rate  of  £95%  or  a 
deviation  from  Hardy-Weinberg  equilibrium  in  controls  at  P<  0.00001.  Tests 
of  association  were  1  d.f.  Cochran- Armitage  tests,  stratified  for  stage,  centre  and 
ethnic  group  (European  or  Asian).  Odds  ratios  for  each  SNP  were  estimated 
using  stratified  logistic  regression,  using  the  stage  3  data  only. 

Full  Methods  and  any  associated  references  are  available  in  the  online  version  of 
the  paper  at  www.nature.com/nature. 
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METHODS 

Subjects.  Cases  in  stage  1  were  identified  through  clinical  genetics  centres  in 
Cambridge  ( n  =  91),  Manchester  (96)  and  Southampton  (136),  and  a  national 
study  of  bilateral  breast  cancer  (85).  Cases  were  women  diagnosed  with  invasive 
breast  cancer  under  the  age  of  60  years  who  had  a  family  history  score  of  at  least  2, 
where  the  score  was  computed  as  the  total  number  of  first-degree  relatives  plus 
half  the  number  of  second-degree  relatives  affected  with  breast  cancer.  The  score 
for  women  with  bilateral  breast  cancer  was  increased  by  1,  so  that  women  were 
eligible  if  they  were  diagnosed  with  bilateral  breast  cancer  and  had  one  affected 
first-degree  relative.  Cases  known  to  carry  a  BRCA1  or  BRCA2  mutation  were 
excluded.  Controls  were  selected  from  the  EPIC-Norfolk  study,  a  population- 
based  cohort  study  of  diet  and  cancer  based  in  Norfolk,  East  Anglia,  UK33. 
Controls  were  chosen  to  be  women  aged  over  50  years  and  free  of  cancer  at 
the  time  of  entry.  Genotyping  was  attempted  on  408  cases,  plus  32  duplicate 
case  samples,  and  400  controls.  For  the  analysis  in  Table  1,  54  samples  with 
genotype  call  rates  <80%  were  excluded,  so  the  final  analyses  were  based  on 
390  cases  and  364  controls.  The  minimum  genotype  call  rate  for  the  remaining 
samples  was  89%.  The  overall  genotype  discordance  rate  between  duplicate 
samples  in  stage  1  was  0.01%. 

For  stage  2,  invasive  breast  cancer  cases  were  drawn  from  SEARCH,  a  popu¬ 
lation-based  study  of  cancer  in  East  Anglia32.  Controls  were  women  selected 
from  the  EPIC-Norfolk  study,  as  previously  described33.  Eighty-eight  subjects 
who  were  also  genotyped  in  stage  1,  and  35  controls  who  subsequently  developed 
breast  cancer  and  were  also  in  the  case  series,  were  excluded  from  the  analysis, 
leaving  3,990  breast  cancer  cases  and  3,916  controls,  plus  five  duplicates.  The 
overall  rate  of  discordance  of  genotypes  between  duplicate  samples  in  stage  2  was 
0.008%. 

Twenty-one  additional  studies  were  included  in  stage  3  (see  Supplementary 
Table  2).  These  studies  participated  through  the  Breast  Cancer  Association 
Consortium,  an  ongoing  collaboration  among  investigators  conducting  case- 
control  association  studies  in  breast  cancer15,33.  All  studies  provided  information 
on  disease  status  (invasive  breast  cancer,  carcinoma  in  situ  or  control),  age  at 
diagnosis/observation,  ethnic  group,  first-degree  family  history  of  breast  cancer 
and  bilaterality  of  breast  cancer.  One  further  study  (Breast  Cancer  Study  of 
Taiwan)  was  included  in  the  fme-scale  mapping  of  the  FGFR2  locus. 
Genotyping.  For  stage  1,  genotyping  was  performed  on  200  ng  DNA  that  was 
first  subjected  to  whole  genome  amplification  using  Multiple  Displacement 
Amplification  (MDA)36.  Samples  were  then  genotyped  for  a  set  of  266,732 
SNPs  using  high-density  oligonucleotide,  photolithographic  microarrays  at 
Perlegen  Sciences.  For  stage  2,  genotyping  was  performed  using  2.5  pg  genomic 
DNA.  These  samples  were  genotyped  for  a  set  of  13,023  SNPs  selected  on  the 
basis  of  the  stage  1  results,  using  a  custom  designed  oligonucleotide  array.  For 
both  stages,  each  SNP  was  interrogated  by  24  25-mer  oligonucleotide  probes 
synthesized  by  photolithography  on  a  glass  substrate.  The  24  features  comprise  4 
sets  of  6  features  interrogating  the  neighbourhoods  of  SNP  reference  and  alterna¬ 
tive  alleles  on  forward  and  reference  strands.  Each  allele  and  strand  is  represented 
by  five  offsets:  —2,  —1,0,  1  and  2  indicating  the  position  of  the  SNP  within  the 
25-mer,  with  zero  being  at  the  thirteenth  base.  At  offset  0  a  quartet  was  tiled, 
which  included  the  perfect  match  to  reference  and  alternative  SNP  alleles,  and 
the  two  remaining  nucleotides  as  mismatch  probes.  When  possible,  the  mis¬ 
match  features  were  selected  as  a  purine  nucleotide  substitution  for  a  purine 
perfect  match  nucleotide  and  a  pyrimidine  nucleotide  substitution  for  a  pyri¬ 
midine  perfect  match  nucleotide.  Thus,  each  strand  and  allele  tiling  consisted  of 
6  features  comprising  five  perfect  match  probes  and  one  mismatch. 

Individual  genotypes  were  determined  by  clustering  all  SNP  scans  in  the  two- 
dimensional  space  defined  by  reference  and  alternative  trimmed  mean  intens¬ 
ities,  corrected  for  background.  Allele  frequencies  were  approximated  using  the 
intensities  collected  from  the  high-density  oligonucleotide  arrays.  An  SNP’s 
allele  frequency,  p,  was  estimated  as  the  ratio  of  the  relative  amount  of  the 
DNA  with  reference  allele  to  the  total  amount  of  DNA.  The  p  value  was  com¬ 
puted  from  the  trimmed  mean  intensities  of  perfect  match  features,  after  sub¬ 
tracting  a  measure  of  background  computed  from  trimmed  means  of  intensities 
of  mismatch  features.  The  trimmed  mean  disregarded  the  highest  and  the  lowest 
intensity  from  the  five  perfect  match  intensities  before  computing  the  arithmetic 
mean.  For  the  mismatch  features,  the  trimmed  mean  is  the  individual  intensity  of 
the  specified  mismatch  feature. 

The  genotype  clustering  procedure  was  an  iterative  algorithm  developed  as  a 
combination  of  K-means  and  constrained  multiple  linear  regressions.  The 
K-means  at  each  step  re-evaluated  the  cluster  membership  representing  distinct 
diploid  genotypes.  The  multiple  linear  regressions  minimized  the  variance  in  p 
within  each  cluster  while  optimizing  the  regression  lines’  common  intersect.  The 
common  intersect  defined  a  measure  of  common  background  that  was  used  to 
adjust  the  allele  frequencies  for  the  next  step  of  K-means.  The  K-means  and 
multiple  linear  regression  steps  were  iterated  until  the  cluster  membership  and 


background  estimates  converged.  The  best  number  of  clusters  was  selected  by 
maximizing  the  total  likelihood  over  the  possible  cluster  counts  of  1,  2  and  3 
(representing  the  combinations  of  the  three  possible  diploid  genotypes).  The 
total  likelihood  was  composed  of  data  likelihood  and  model  likelihood.  The  data 
likelihood  was  determined  using  a  normal  mixture  model  for  the  distribution  of 
p  around  the  cluster  means.  The  model  likelihood  was  calculated  using  a  prior 
distribution  of  expected  cluster  positions,  resulting  in  optimal  p  positions  of  0.8 
for  the  homozygous  reference  cluster,  0.5  for  the  heterozygous  cluster  and  0.2  for 
the  homozygous  alternative  cluster. 

A  genotyping  quality  metric  was  compiled  for  each  genotype  from  1 5  input 
metrics  that  described  the  quality  of  the  SNP  and  the  genotype.  The  genotyping 
quality  metric  correlated  with  a  probability  of  having  a  discordant  call  between 
the  Perlegen  platform  and  outside  genotyping  platforms  (that  is,  non-Perlegen 
HapMap  project  genotypes).  A  system  of  10  bootstrap  aggregated  regression 
trees  was  trained  using  an  independent  data  set  of  concordance  data  between 
Perlegen  genotypes  and  HapMap  project  genotypes.  The  trained  predictor  was 
then  used  to  predict  the  genotyping  quality  for  each  of  the  genotypes  in  this  data 
set.  Genotypes  with  quality  scores  of  less  than  7  were  discarded.  Data  were 
analysed  for  227,876  SNPs  in  stage  1  and  12,026  (of  13,023  selected)  in  stage 
2,  for  which  the  call  rate  was  >80%. 

The  12,711  SNPs  for  stage  2  were  primarily  selected  on  the  basis  of  a  1  d.f. 
Cochran- Armitage  trend  test  (11,809,  all  with  P<  0.052).  We  also  included  826 
SNPs  with  P  <  0.01  testing  for  the  difference  in  frequency  of  either  homozygote 
between  cases  and  controls  (that  is,  assuming  either  a  dominant  or  recessive 
model)  and  76  SNPs  that  achieved  P  <  0.01  on  a  Cochran-Armitage  test,  weight¬ 
ing  individuals  by  their  family  history  score  as  above. 

For  the  main  analyses,  we  discarded  SNPs  with  a  call  rate  <90%  in  stage  1  and 
95%  in  stage  2,  and  SNPs  with  a  deviation  from  Hardy-Weinberg  equilibrium 
significant  at  P<  0.00001  in  either  stage,  leaving  205,586  SNPs  in  stage  1  and 
10,621  SNPs  in  stage  2. 

The  30  SNPs  included  in  the  stage  3  analyses  were  initially  selected  on  the  basis 
of  a  combined  analysis  of  stage  1  and  stage  2.  We  included  all  SNPs  achieving  a 
combined  P<  0.00002  (based  on  either  the  Cochran-Armitage  or  2  d.f.  test,  see 
below).  Following  re-evaluation  of  the  stage  2  genotyping  by  5'  nuclease  assay 
(Taqman,  Applied  Biosystems)  using  the  ABI  PRISM  7900HT  (Applied 
Biosystems),  and  exclusion  of  some  samples,  16  of  these  SNPs  were  significant 
at  P<  0.00002  and  24  at  P<  0.0002  (Supplementary  Table  3).  One  additional 
SNP,  rs3803662,  was  added  as  a  result  of  fine-scale  mapping  of  the  TNRC9/ 
LOC643714  locus. 

The  31  stage  3  SNPs  were  genotyped  in  22  studies  (including  cases  and  con¬ 
trols  from  SEARCH  not  used  in  stage  2,  together  with  21  other  studies).  For  18  of 
the  studies,  genotyping  was  performed  by  5'  nuclease  assay  (Taqman)  using  the 
ABI  PRISM  7900HT  or  7500  Sequence  Detection  Systems  according  to  manu¬ 
facturer’s  instructions.  Primers  and  probes  were  supplied  directly  by  Applied 
Biosystems  (http://www.appliedbiosystems.com/)  as  Assays-by-Design.  All 
assays  were  carried  out  in  384-well  or  96-well  format,  with  each  plate  including 
negative  controls  (with  no  DNA).  Duplicate  genotypes  were  provided  for  at  least 
2%  of  samples  in  each  study.  For  three  studies,  SNPs  were  genotyped  using 
matrix  assisted  laser  desorption/ionization  time  of  flight  mass  spectrometry 
(MALDI-TOF  MS)  for  the  determination  of  allele-specific  primer  extension 
products  using  Sequenom’s  MassARRAY  system  and  iPLEX  technology.  The 
design  of  oligonucleotides  was  carried  out  according  to  the  guidelines  of 
Sequenom  and  performed  using  MassARRAY  Assay  Design  software  (version 
1.0).  Multiplex  PCR  amplification  of  amplicons  containing  SNPs  of  interest  was 
performed  using  Qiagen  HotStart  Taq  Polymerase  on  a  Perkin  Elmer  GeneAmp 
2400  thermal  cycler  (MJ  Research)  with  5  ng  genomic  DNA.  Primer  extension 
reactions  were  carried  out  according  to  manufacturer’s  instructions  for  iPLEX 
chemistry.  Assay  data  were  analysed  using  Sequenom  TYPER  software  (version 
3.0).  One  study  used  both  the  Taqman  and  MALDI-TOF  MS  approaches.  The 
SNPs  genotyped  in  stage  3  were  also  regenotyped  in  the  stage  2  samples  using 
Taqman;  these  genotype  calls  were  used  in  the  overall  analyses  (Table  2, 
Supplementary  Table  3,  and  Fig.  2). 

We  eliminated  any  sample  that  could  not  be  scored  on  20%  of  the  SNPs 
attempted.  We  also  removed  data  for  any  centre/SNP  combination  for  which 
the  call  rate  was  less  than  90%.  In  any  instances  where  the  call  rate  was  90-95%, 
the  clustering  of  genotype  calls  was  re-evaluated  by  an  independent  observer  to 
determine  whether  the  clustering  was  sufficiently  clear  for  inclusion.  We  also 
eliminated  all  the  data  for  a  given  SNP/centre  where  the  reproducibility  in 
duplicate  samples  was  <97%,  or  where  there  was  marked  deviation  from 
Hardy-Weinberg  equilibrium  in  the  controls  (P<  0.00001). 

Fine-scale  mapping  of  FGFR2.  Initial  tagging  of  the  associated  region  was  done 
by  identifying  all  SNPs  with  an  m.a.f.  >  5%  in  the  HapMap  CEPH/CEU  set 
(Utah  residents  with  ancestry  from  northern  and  western  Europe).  We  then 
selected  7  SNPs  (in  addition  to  rs2981582)  that  tagged  these  variants  with  a 
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pairwise  ^>0.8,  using  the  program  Tagger  (http://www.broad.mit.edu/mpg/ 
tagger/)37.  To  identify  additional  common  variants  within  the  32.5  kb  region  of 
linkage  around  the  associated  SNP,  we  resequenced  45  lymphocyte  DNA  samples 
from  a  subset  of  European  subjects  also  genotyped  by  HapMap  and  other  pub¬ 
licly  available  data  sets.  Seventy  overlapping  PCR  amplicons  were  designed  from 
positions  123317613  to  123348192  of  chromosome  10  (average  amplicon  size 
650 bp,  160  bp  overlap).  M13-tagged  PCR  products  were  bidirectionally 
sequenced  using  Big  Dye  3.0  (Applied  Biosystems)  and  processed  using  auto¬ 
mated  trace  analysis  through  the  Cancer  Genome  Workbench  (cgwb.nci.nih.- 
gov).  Eighty-six  per  cent  of  the  nucleotides  across  the  region  could  be  scored  for 
polymorphisms  in  at  least  80%  of  subjects.  This  set  gave  a  >97%  probability  of 
detecting  a  variant  with  an  m.a.f.  >  5%.  One  hundred  and  seventeen  variants 
were  identified,  including  27  present  in  dbSNP  but  without  individual  genotype 
information  in  European  subjects,  and  an  additional  46  not  in  dbSNP. 
Individual  genotype  information  was  then  compared  and  merged  with  publicly 
available  genotypes  from  Caucasian  subjects  (HapMap  release  21  for  60  CEU 
parents,  22  European  subjects  from  the  Environmental  Genome  Project  (EGP) 
resequencing  effort  (http://egp.gs.washington.edu/data/fgfr2/),  and  24  Euro¬ 
pean  subjects  from  Perlegen  (retrieved  through  http://gvs.gs.washington.edu/ 
GVS)).  There  were  2  discrepancies  among  389  genotype  calls  among  subjects 
in  common  between  our  resequencing  effort  and  EGP  or  Perlegen  data,  and  10 
out  of  926  compared  to  HapMap  genotypes. 

On  the  basis  of  these  data,  we  identified  28  SNPs  correlated  with  rs2981582 
with  r2  >  0.6.  We  then  attempted  to  genotype  these  28  SNPs,  plus  rs2981582,  in  a 
subset  of  80  controls  from  SEARCH  and  84  controls  from  the  Seoul  Breast 
Cancer  Study.  Twenty- two  of  the  variants  were  genotyped  using  Taqman. 
Four  further  variants  (rs34032268,  rs2912778,  rs2912781  and  rs7895676),  which 
were  not  amenable  to  Taqman,  were  genotyped  by  Pyrosequencing  (Biotage; 
http://www.biotagebio.com/).  Assays  were  designed  using  Pyrosequencing 
Assay  Design  Software  1.0.  The  remaining  2  SNPs  (rs35393331  and 
rs33971856)  could  not  be  genotyped  using  either  technology  and  were  excluded 
from  further  analyses.  We  cannot  therefore  comment  on  their  likelihood  of  being 
the  causal  variant.  Using  these  data,  we  selected  tagging  sets  of  1 1  SNPs  for  UK 
subjects  and  14  SNPs  for  Korean  subjects  (including  rs2981582),  such  that  each 
of  the  remaining  variants  was  correlated  with  a  tagging  SNP  with  r2  >  0.95  in  the 
UK  study  or  r2  >  0.86  in  the  Korean  study.  After  genotyping  the  11  tag  SNPs  in 
SEARCH,  two  of  these  SNPs  (rs4752569  and  rs350 12336)  showed  strong  evid¬ 
ence  against  being  the  causative  variant  and  were  not  considered  further.  The 
remaining  12  tag  SNPs  from  the  Korean  subset  were  then  genotyped  in  the 
samples  from  the  IARC-Thai  Breast  Cancer  Study,  the  Breast  Cancer  Study  in 
Taiwan  and  the  Multi-Ethnic  Cohort  (MEC),  by  Taqman. 

Statistical  methods.  The  primary  test  used  for  each  SNP  was  a  Cochran- 
Armitage  1  d.f.  score  test  for  association  between  disease  status  and  allele  dose. 
In  the  combined  analysis,  we  performed  a  stratified  Cochran-Armitage  test. 
Stage  1  was  given  a  weight  of  4  in  this  analysis  (corresponding  to  a  weight  of  2 
in  the  score  statistic),  to  allow  for  the  expected  greater  effect  size  given  the 
inclusion  of  cases  with  a  family  history.  In  the  stage  3  analyses,  each  study  was 
treated  as  a  separate  stratum,  except  for  the  MEC,  in  which  the  European 
American  and  Japanese  American  subgroups  were  treated  as  separate  strata. 
For  all  studies  except  the  MEC,  individuals  from  a  minor  ethnic  group  for  that 
study  were  excluded.  Per-allele  and  genotype-specific  odds  ratios,  and  confid¬ 
ence  intervals,  were  estimated  using  logistic  regression,  adjusting  for  the  same 
strata.  The  summary  odds  ratios  in  Fig.  2  are  based  on  the  data  from  the  stage  3 
studies  only,  to  avoid  the  bias  inherent  in  estimates  from  the  stage  1  and  2  data 
for  SNPs  exhibiting  an  association  (the  so  called  ‘winner’s  curse’).  The  effects  of 
genotype  on  family  history  of  breast  cancer  (first  degree  yes/no)  and  bilaterality 
were  examined  by  treating  these  variables  as  outcomes  in  a  stratified  Cochran- 
Armitage  test. 

To  assess  the  global  significance  of  the  SNPs  in  stage  3,  we  computed  the  sum 
of  the  x2  trend  statistics  (excluding  the  6  SNPs  reaching  genome- wide  signifi¬ 
cance,  plus  rs2107425  as  it  was  in  LD  with  rs3817198)  over  those  SNPs  (17  of  23) 
for  which  the  estimated  odds  ratios  in  stage  3  were  in  the  same  direction  as  the 
combined  stage  1/stage  238.  Under  the  null  hypothesis  of  no  association,  the 
asymptotic  distribution  of  this  statistic  is  j2  with  n  degrees  of  freedom,  where 
n  has  a  binomial  distribution  with  parameters  23  and  1/2.  The  significance  of  this 
statistic  was  then  assessed  by  computing  a  weighted  sum  of  the  tails  of  the 
relevant  y2  distributions. 

For  the  fine-scale  mapping  of  the  FGFR2  locus,  we  first  derived  haplotype 
frequencies  using  the  haplo. stats  package  in  S-plus39,  separately  for  the  European 
and  Asian  populations,  using  data  from  the  case-control  studies  on  whom  the  tag 
SNPs  were  typed  plus  the  164  control  individuals  on  whom  all  SNPs  were  typed. 
These  were  used  to  impute  genotype  probabilities  for  each  identified  SNP  in  each 
individual.  We  then  used  an  EM  algorithm  to  fit  a  logistic  regression  model 
assuming  that  each  SNP  in  turn  was  the  causal  variant,  allowing  for  uncertainty 


in  the  genotypes  of  untyped  SNPs,  and  hence  to  determine  the  likelihood  that 
each  SNP  was  the  causal  variant. 

Coverage  of  the  stage  1  tagging  set  was  estimated  using  HapMap  phase  II  as  a 
reference.  We  based  estimates  on  2,116,183  SNPs  with  an  m.a.f.  of  >5%  in  the 
CEU  population.  Of  the  SNPs  successfully  genotyped  in  stage  1,  187,663  were 
also  on  HapMap.  For  those  SNPs  not  on  HapMap,  we  identified  ‘surrogate’  SNPs 
that  were  in  perfect  LD  based  on  genotyping  of  24  Caucasians  by  Perlegen 
Sciences  (269,203  SNPs)18.  To  estimate  coverage,  we  determined  the  best  pair¬ 
wise  r2  for  each  HapMap  SNP  and  each  tag  SNP  or  a  surrogate  SNP,  using  the 
HapMap  CEU  data.  This  coverage  was  summarized  in  terms  of  the  distribution 
of  r2  by  allele  frequency  in  10  categories. 

To  estimate  the  power  to  detect  each  of  the  associations  found,  we  computed 
the  non- centrality  parameter  for  the  test  statistic  at  each  stage,  based  on  the  per- 
allele  relative  risk,  allele  frequency  and  r2.  This  was  used  to  estimate  the  power  for 
a  given  r2,  based  on  a  simulated  trivariate  normal  distribution  for  the  score 
statistics  after  each  stage  to  allow  for  the  correlations  in  the  test  statistics.  We 
assumed  a  cut-off  of  P<  0.05  for  stage  1,  P<  0.00002  for  stage  2  and  P<  10-7 
for  stage  3  (the  first  is  slightly  conservative,  as  more  SNPs  than  this  were  actually 
taken  forward).  The  overall  power  was  obtained  by  averaging  the  power  esti¬ 
mates  for  each  r2  over  the  distribution  of  r2  obtained  from  the  HapMap  data, 
applicable  to  a  SNP  of  that  frequency. 

The  expected  number  of  significant  associations  after  stage  2  (Table  1)  was 
calculated  using  a  bivariate  normal  distribution  for  the  joint  distribution  of  the 
(weighted)  Cochran-Armitage  score  statistics  after  stage  1  and  after  both  stages, 
using  a  correlation  of  0.525  between  the  two  statistics  (reflecting  the  weighted 
sizes  of  the  two  studies).  These  calculations  were  based  on  the  205,586  SNPs 
reaching  the  required  quality  control  in  stage  1.  Of  these,  11,313  reached  a 
P<  0.05,  of  which  7,405  (65.5%)  were  successfully  genotyped  to  the  required 
quality  control  in  stage  2.  Thus  the  expected  number  reaching  a  given  signifi¬ 
cance  level  with  good  quality  control  was  calculated  from  the  total  number 
expected  to  reach  this  level  X  65.5%.  We  adjusted  the  variances  of  the  test 
statistics,  separately  for  stages  1  and  2,  using  the  genomic  control  method22. 
The  adjustment  factor,  A,  was  estimated  from  the  median  of  the  smallest  90% 
of  the  test  statistics  for  SNPs  typed  in  that  stage,  divided  by  the  predicted  median 
for  the  smallest  90%  of  a  sample  of  x2i  distributions  (that  is,  the  45%  percentile 
of  a  distribution,  0.375). 
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